Mô hình khai phá quan điểm dựa trên đặc trưng các đánh giá sản phẩm trong tiếng việt - Pdf 30

Mô hình khai phá quan điểm dựa trên đặc trưng
các đánh giá sản phẩm trong tiếng Việt Vũ Tiến Thành Trường Đại học Công nghệ
Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01
Người hướng dẫn: PGS.TS. Hà Quang Thụy
Năm bảo vệ: 2012
Abstract: In this thesis, we present an approach to build an opinion mining
system of customer reviews according to product features based on Vietnamese
syntax rules and VietSentiWordNet dictionary in four phases: (1)Pre-processing;
(2)Extracting explicit/implicit product features and opinion-words,and grouping
synonym product features; (3)Identifying orientation of opinion; and
(4)Summarizing the results. With three main contributions as following: Firstly,
in the phase 1, we build a Vietnamese accented system combined N-gram
statistic model and Hidden Markov model(HMM) for the purpose ofconverting a
sentence without accents into a Vietnamese accented sentence. Secondly, in the
phase 2, we construct a mapping dictionary to identify implicit features by
mapping those ones to corresponding opinion words; and we proposed a method
of using SVM-kNN semi-supervised learning along with HAC clustering method
generating training set for SVM-kNN to group synonym features; after that, co-
reference was resolved by using some Vietnamese rules.

Keywords: Khoa học máy tính; Khai phá dữ liệu; Mô hình dữ liệu
Table of Contents

4.1.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Product Features Extraction Evaluation . . . . . . . . . . . . . . . . 30
4.3 Opinion Words Extraction Evaluation . . . . . . . . . . . . . . . . . . 31
4.4 The Whole System Evaluation . . . . . . . . . . . . . . . . . . . . . . 32
5 Conclusion 36
Bibliography
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced lexical
resource for sentiment analysis and opinion mining. In Nicoletta Calzolari (Conference Chair),
Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner,
and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language
Resources and Evaluation (LREC’10), Valletta, Malta, may 2010. European Language Resources
Association (ELRA). ISBN 2-9517408-6-7.
Giuseppe Carenini, Raymond T. Ng, and Ed Zwart. Extracting knowledge from evaluative text.
In K-CAP, pages 11–18, 2005.
Amitava Das and Sivaji Bandyopadhyay. Sentiwordnet for indian languages. In Proceedings of The
8th Workshop on Asian Language Resources, pages 56—-63, 2010.
Andrea Esuli. Automatic generation of lexical resources for opinion mining: models, algorithms
and applications. SIGIR Forum, 42:105–106, November 2008. ISSN 0163-5840.
Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for
opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation
(LREC’06), pages 417–422, 2006.
Quang-Thuy Ha, Tien-Thanh Vu, Huyen-Trang Pham, and Cong-To Luu. An upgrading feature-
based opinion mining model on vietnamese product reviews. In Proceedings of the 7th interna-
tional conference on Active media technology, AMT’11, pages 173–185, Berlin, Heidelberg, 2011.
Springer-Verlag. ISBN 978-3-642-23619-8.
Vasileios Hatzivassiloglou and Kathleen R. McKeown. Predicting the semantic orientation of ad-
jectives. In Proceedings of the eighth conference on European chapter of the Association for
Computational Linguistics, EACL ’97, pages 174–181, Stroudsburg, PA, USA, 1997. Associa-
tion for Computational Linguistics.
Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines.

Language Processing, HLT ’05, pages 339–346, Stroudsburg, PA, USA, 2005. Association for
Computational Linguistics.
Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Expanding domain sentiment lexicon through
double propagation. In Proceedings of the 21st international jont conference on Artifical intelli-
gence, IJCAI’09, pages 1199–1204, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers
Inc.
Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Opinion word expansion and target extraction
through double propagation. Comput. Linguist., 37:9–27, 2011. ISSN 0891-2017.
Bibliography 40
Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng, and Chun Jin.
Red opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on
Electronic commerce, EC ’07, pages 182–191, New York, NY, USA, 2007. ACM. ISBN 978-1-
59593-653-0.
Veselin Stoyanov and Claire Cardie. Topic identification for fine-grained opinion analysis. In
Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1,
COLING ’08, pages 817–824, Stroudsburg, PA, USA, 2008. Association for Computational Lin-
guistics. ISBN 978-1-905593-44-6.
Mike Thelwall. Myspace comments. Online Information Review, 33(1):58–76, 2009.
Peter D Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classi-
fication of reviews. Computational Linguistics, pages(July):8, 2002. URL http://cogprints.
org/2321/.
Peter D. Turney and Michael L. Littman. Measuring praise and criticism: Inference of semantic
orientation from association. ACM Trans. Inf. Syst., 21:315–346, October 2003. ISSN 1046-8188.
Tien-Thanh Vu, Huyen-Trang Pham, Cong-To Luu, and Quang-Thuy Ha. A feature-based opin-
ion mining model on product reviews in vietnamese. In Radoslaw Katarzyniak, Tzu-Fu Chiu,
Chao-Fu Hong, and Ngoc Nguyen, editors, Semantic Methods for Knowledge Management and
Communication, volume 381 of Studies in Computational Intelligence, pages 23–33. Springer
Berlin Heidelberg, 2011. ISBN 978-3-642-23417-0.
Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia. Grouping product features using semi-supervised
learning with soft-constraints. In Proceedings of the 23rd International Conference on Compu-


Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Music ♫

Copyright: Tài liệu đại học © DMCA.com Protection Status