Abstract: |
Twitter, a microblogging online social network (OSN), has quickly gained prominence as it provides people
with the opportunity to communicate and share posts and topics. Tremendous value lies in automated
analysing and reasoning about such data in order to derive meaningful insights, which carries potential
opportunities for businesses, users, and consumers. However, the sheer volume, noise, and dynamism of
Twitter, imposes challenges that hinder the efficacy of observing clusters with high intra-cluster (i.e.
minimum variance) and low inter-cluster similarities. This review focuses on research that has used various
clustering algorithms to analyse Twitter data streams and identify hidden patterns in tweets where text is
highly unstructured. This paper performs a comparative analysis on approaches of unsupervised learning in
order to determine whether empirical findings support the enhancement of decision support and pattern
recognition applications. A review of the literature identified 13 studies that implemented different clustering
methods. A comparison including clustering methods, algorithms, number of clusters, dataset(s) size, distance
measure, clustering features, evaluation methods, and results was conducted. The conclusion reports that the
use of unsupervised learning in mining social media data has several weaknesses. Success criteria and future
directions for research and practice to the research community are discussed. |