Dandan Zhu, Yusuke Fukazawa, Jun Ota
2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 359-366, 2013 Peer-reviewed
We propose a topic model capable of generating tri-layer clusters, each of which is composed of a topic layer, an activity layer and a word layer. The objective is to better predict activities involved in documents by considering general topics of the activities for clustering. The proposed model is a supervised topic model based on the Latent Dirichlet Allocation (LDA). As a follow-up study of word-pair generation LDA (wpLDA) model, the model introduces the topic-specific activity distribution as an external input, with an activity node inserted into the main generation thread. In addition, we refer to D. Ramage et al.'s one-to-one correspondence to directly learn word-activity tags. An experiment was conducted to prove the feasibility of this model. We chose ten top-listed activities from the wish clusters obtained by the previous wpLDA research, and used each as the key words to extract thirty tweets for training and five for testing, respectively, tagging the tweets with the corresponding activities. By applying the proposed model, we obtained the expected tri-layer clusters in the training phase. Then, in the testing phase, we utilized the activity-specific word distribution derived from the training results to learn the activities of the testing documents. The Stanford Classifier was put forward as the control group, and the activity prediction accuracy demonstrates that the proposed model exhibits the superiority in multi-activity prediction.