TY - JOUR ID - 104113 TI - Authorship Clustering using Homogeneous Feature Space and Two-stepped Automatic Fuzzy Cmeans Clustering JO - Journal of Applied Intelligent Systems and Information Sciences JA - JAISIS LA - en SN - 2821-1987 AU - Aminian, Mohammad AU - Eskandari, Mahdi AD - Computer Engineering Department, Bu Ali Sina University, Hamedan, Iran Y1 - 2020 PY - 2020 VL - 1 IS - 1 SP - 54 EP - 63 KW - authorship clustering KW - homogeneous features KW - word Ngram KW - part-of-speech KW - fuzzy Cmeans DO - 10.22034/jaisis.2020.219089.1006 N2 - Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications.  Moreover, it is a challenging task for both humans and computers considering complex content of document with variety of backgrounds. Due to nature of task it is always considered as an unsupervised task. Clustering documents according to the linguistic style of the authors who wrote them has been a task little studied by the research community. In order to address this problem, PAN Evaluation Framework has become the first effort to promote the development of the author clustering. There are different approaches to address the task and this article proposes a method based on a set of homogeneous features and two-stepped automatic FCM clustering. We use word Ngram, part-of-speech tagging and some other context free features, then using document similarity graph (DSG) estimating number of clusters; finally we use FCM to cluster corpus. We have done the task in very short amount of time and our performance results is comparable with leaderboard competitors in PAN CLEF 2017 challenge. UR - https://journal.research.fanap.com/article_104113.html L1 - https://journal.research.fanap.com/article_104113_be73a35e26715ce487231aadac5f89ba.pdf ER -