An Improvised Sub-Document Based Framework for Efficient Document Clustering,ERICDATA高等教育知識庫
高等教育出版
熱門: 朱丽彬  黃光男  王美玲  王善边  曾瓊瑤  崔雪娟  
高等教育出版
首頁 臺灣期刊   學校系所   學協會   民間出版   大陸/海外期刊   政府機關   學校系所   學協會   民間出版   DOI註冊服務
閱讀全文
篇名
An Improvised Sub-Document Based Framework for Efficient Document Clustering
並列篇名
An Improvised Sub-Document Based Framework for Efficient Document Clustering
作者 Muhammad Qasim MemonJingsha HeYu LuNafei ZhuAasma Memon
英文摘要
Document clustering, which is used for topic discovery and similarity computation, has received a great deal of attention in text data management. Methods that have been adopted in traditional clustering, particularly for multi-topic documents, are not viable because the contents that are distinguished by the sub topical structure may not be pertinent across the entire documents. In this paper, a sub-document based framework for clustering multiple documents is proposed in which LDA is used for document segmentation. The proposed improvised framework is a two-way approach to address the clustering problem. First, instead of applying a clustering algorithm to the entire data sets, documents are partitioned into cohesive sub-documents along topic boundaries through text segmentation to establish a twolevel representation of text data, i.e., topics and words. Second, the proposed framework is compared to existing clustering methods, both traditional and segment based clustering through different clustering algorithms using the F-measure as the measurement metric. In addition, various real-time data sets that contain multi-topic documents are applied to validating the clustering algorithms through the proposed sub-document based framework. Each sub-document is clustered within a document and the resulting clusters are further clustered across the documents. Experimental results show that the proposed framework outperforms existing clustering approaches in terms of the F-measure as well as efficiency at least 73% with LDA segmentation and bisecting LDA in comparison to TextTiling.
起訖頁 1191-1204
關鍵詞 Clustering algorithmsText analysisText miningInformation retrievalData mining
刊名 網際網路技術學刊  
期數 201907 (20:4期)
出版單位 台灣學術網路管理委員會
DOI 10.3966/160792642019072004018   複製DOI
QR Code
該期刊
上一篇
Local and Outsourced Simultaneous Verification of Pairing-based Signatures
該期刊
下一篇
Hierarchical Feature Selection with Orthogonal Transfer

高等教育知識庫  新書優惠  教育研究月刊  全球重要資料庫收錄  

教師服務
合作出版
期刊徵稿
聯絡高教
高教FB
讀者服務
圖書目錄
教育期刊
訂購服務
活動訊息
數位服務
高等教育知識庫
國際資料庫收錄
投審稿系統
DOI註冊
線上購買
高點網路書店 
元照網路書店
博客來網路書店
教育資源
教育網站
國際教育網站
關於高教
高教簡介
出版授權
合作單位
知識達 知識達 知識達 知識達 知識達 知識達
版權所有‧轉載必究 Copyright2011 高等教育文化事業股份有限公司  All Rights Reserved
服務信箱:edubook@edubook.com.tw 台北市館前路 26 號 6 樓 Tel:+886-2-23885899 Fax:+886-2-23892500