篇名 |
Applying the Chi-Square Test to Improve the Performance of the Decision Tree for Classification by Taking Baseball Database as an Example
|
---|---|
並列篇名 | Applying the Chi-Square Test to Improve the Performance of the Decision Tree for Classification by Taking Baseball Database as an Example |
作者 | Chia-En Li、Ye-In Chang |
中文摘要 | The chi-square test is one of the statistical tests and is good to analyze whether categorical variable A is the significant factor to categorical variable B. On the other hand, a decision tree is one of useful models for data classification. To achieve the goal of efficient knowledge discovery by a compact decision tree, in this paper, we propose a method by making use of the result of the chi-square test to reduce the number of concerned attributes. We make use of the P-value from the chi-square test to decide the significant factors as the preprocessing step to prune insignificant factors before constructing the decision tree. In such a way, we can avoid constructing the inaccurate decision tree. We use the public baseball database as an example to illustrate our method. From our performance study, we observe that the way of checking the most significant factor (i.e., the factor with the minimum P-value) first can reduce the number of conditions (i.e., levels) to be decided. Therefore, the compact decision tree constructed from our method can provide less storage cost, faster prediction time and higher degree of accuracy for data classification than the decision tree concerning all original factors. |
起訖頁 | 001-015 |
關鍵詞 | chi-square test、classification、data mining、decision tree、significant factor |
刊名 | 電腦學刊 |
期數 | 201812 (29:6期) |
DOI |
|
QR Code | |
該期刊 下一篇
| Neural Fuzzy Controller Based Transmission Power Control for Wireless Sensor Networks |