閱讀全文 | |
篇名 |
An Improvised Sub-Document Based Framework for Efficient Document Clustering
|
---|---|
並列篇名 | An Improvised Sub-Document Based Framework for Efficient Document Clustering |
作者 | Muhammad Qasim Memon、Jingsha He、Yu Lu、Nafei Zhu、Aasma Memon |
英文摘要 | Document clustering, which is used for topic discovery and similarity computation, has received a great deal of attention in text data management. Methods that have been adopted in traditional clustering, particularly for multi-topic documents, are not viable because the contents that are distinguished by the sub topical structure may not be pertinent across the entire documents. In this paper, a sub-document based framework for clustering multiple documents is proposed in which LDA is used for document segmentation. The proposed improvised framework is a two-way approach to address the clustering problem. First, instead of applying a clustering algorithm to the entire data sets, documents are partitioned into cohesive sub-documents along topic boundaries through text segmentation to establish a twolevel representation of text data, i.e., topics and words. Second, the proposed framework is compared to existing clustering methods, both traditional and segment based clustering through different clustering algorithms using the F-measure as the measurement metric. In addition, various real-time data sets that contain multi-topic documents are applied to validating the clustering algorithms through the proposed sub-document based framework. Each sub-document is clustered within a document and the resulting clusters are further clustered across the documents. Experimental results show that the proposed framework outperforms existing clustering approaches in terms of the F-measure as well as efficiency at least 73% with LDA segmentation and bisecting LDA in comparison to TextTiling. |
起訖頁 | 1191-1204 |
關鍵詞 | Clustering algorithms、Text analysis、Text mining、Information retrieval、Data mining |
刊名 | 網際網路技術學刊 |
期數 | 201907 (20:4期) |
出版單位 | 台灣學術網路管理委員會 |
DOI |
|
QR Code | |
該期刊 上一篇
| Local and Outsourced Simultaneous Verification of Pairing-based Signatures |
該期刊 下一篇
| Hierarchical Feature Selection with Orthogonal Transfer |