Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety, Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.


Course Currilcum

Copyright © 2020. All rights reserved. Template by Discover Projects
Open chat
Hi, how can I help you?