Due to the increasing use of social media, there are millions of active users who are being used to spread malicious content for exploitation and hence it is the need of the hour to detect and curb these activities. Twitter is one such platform who is vulnerable to these activities because of its word limit and hence it becomes easier for hackers to include malicious URLs in the tweets, instead of the legitimate ones. The names of these URLs are shortened and hence it becomes increasingly difficult to validate these without opening them. This project helps in building a supervised machine learning classification model which helps to detect the malicious URLs present in the Twitter stream. Beginning with the Data Collection stage, the data is gathered from the Twitter Streaming API with the help of specified keywords containing URLs and is further labelled as benign or safe by taking the help of Virus Total Database. In the next step, feature selection is done which helps in building the machine learning models. Next in order to achieve the best performance, model selection is done. The models used are namely: Random Forest, Logistic Regression, Decision Tree and XG Boost. After generating the preliminary results, parameter tuning is applied in order to enhance the classifier’s performance which helps in calculating the most accurate result.
1. Random Forest
2. Logistic Regression
3. Decision Tree
4. XG Boosting
- Spam Filtering for Twitter Reference Paper 00:00:00
- Spam Filtering for Twitter Synopsis 00:00:00