閱讀全文 | |
篇名 |
An E-mail Classification Algorithm based on Stacking Integrated Learning
|
---|---|
並列篇名 | An E-mail Classification Algorithm based on Stacking Integrated Learning |
作者 | Li-Xia Wan、Wei-Xing Huang、Qing-Hua Tang |
英文摘要 | The text filtering of traditional anti spam system mainly focuses on keyword matching and text fingerprint analysis, which is difficult to accurately identify and classify spam. Therefore, an integrated learning algorithm based on stackin g is proposed in this paper. Firstly, the algorithm takes the manually marked text data of various categories as samples, uses TF-IDF algorithm to train the word vector space model, then selects linear SVC, xgboost and logistic regression algorithm to structure the base classifier, uses random forest algorithm to structure the meta classifier, and combines the stacking ensemble learning algorithm to structure the classification model. It achieves the function of dividing e-mail into five categories: illegal, advertisement, news, bill and recruitment. From the simulation results, the AUC values of the stacking integrated learning classification algorithm for each category are 0.92, 0.95, 1.00, 0.93 and 0.97 respectively, and the AP values are 0.86, 0.88, 1.00, 0.88 and 0.94 respectively, which realizes the high performance and high precision of text classification.
|
起訖頁 | 105-114 |
關鍵詞 | anti spam system、integrated learning algorithm、TF-IDF algorithm、word vector space model、e-mail classification |
刊名 | 電腦學刊 |
期數 | 202204 (33:2期) |
DOI |
|
QR Code | |
該期刊 上一篇
| Research on Online and Offline Mixed Education Mode in Post Epidemic Era Based on Fuzzy Neural Network-Taking Introduction of Petrochemical Equipment Management as an Example |
該期刊 下一篇
| Named Entity Recognition Model Based on TextCNN-BiLSTM-CRF with Chinese Text Classification |