TY - CONF
T1 - A topic-based Document Correlation Model
T2 - Machine Learning and Cybernetics, 2008 International Conference on
Y1 - 2008
A1 - Jia,Xi-Ping
A1 - Peng,Hong
A1 - Zheng,Qi-Lun
A1 - Zhuolin Jiang
A1 - Li,Zhao
KW - bipartite graph optimal matching
KW - data mining
KW - document correlation analysis
KW - document retrieval
KW - Gibbs sampling
KW - Information retrieval
KW - latent Dirichlet allocation model
KW - text analysis
KW - text mining
KW - topic-based document correlation model
AB - Document correlation analysis is now a focus of study in text mining. This paper proposed a Document Correlation Model to capture the correlation between documents from topic level. The model represents the document correlation as the Optimal Matching of a bipartite graph, of which each partition is a document, each node is a topic, and each edge is the similarity between two topics. The topics of each document are retrieved by the Latent Dirichlet Allocation model and Gibbs sampling. Experiments on correlated document search show that the Document Correlation Model outperforms the Vector Space Model on two aspects: 1) it has higher average retrieval precision; 2) it needs less space to store a documentpsilas information.
JA - Machine Learning and Cybernetics, 2008 International Conference on
VL - 5
M3 - 10.1109/ICMLC.2008.4620826
ER -