Persian Sentiment Analysis: Feature Engineering, Datasets, and Challenges

Document Type : Review Article


1 Department of Computer Engineering, Faculty of Electrical and Computer Engineering, Technical and Vocational University, Kashan, Iran

2 Senior Lecturer, School of continuing and lifelong education, National University of Singapore, 119077, Singapore.


With the pervasive growth of web-based businesses, sentiment analysis of online reviews has attracted increasing interest among text mining experts. The problem is complicated when these reviews are in the Persian language since all existing works are focused on the English language, leaving other languages to multilingual models with limited resources. Due to these drawbacks, we try to give an insight regarding different stages of Persian Sentiment Analysis. This study presents a taxonomy of all Persian Sentiment Analysis works considering the most common techniques. The four steps are considered, namely, pre-processing, feature engineering, lexicon generation, and classification. As a result, we reveal that newer works focus on deep learning methods. Also, we suggest applying other methods such as heuristic and hybrid approaches to be worthwhile for the performance of classification in Persian Sentiment Analysis. Finally, we summarize the most important issues in this domain including the lack of dataset, lexicon, tools, etc.