Zhyvr

Sentiment Analysis

22/06/2023

806 views

Imagine being able to instantly understand the mood behind thousands of Kurdish tweets, reviews, or news articles. That’s what this project is all about.

During HackaSuly, a regional hackathon, I built a sentiment analysis model designed specifically for the Kurdish language. While sentiment analysis is common for languages like English, there hasn’t been much work done for Kurdish despite millions of speakers.

This project aims to fill that gap, making it possible to track opinions, emotions, and trends in Kurdish text in a way that’s accurate and culturally relevant.

Purpose

The primary goal of this project is to create a robust model capable of accurately classifying the sentiment of Kurdish text. The potential applications of such a model are vast and include:

By analyzing Kurdish text sentiment, individuals, organizations, and companies can gain valuable insights into public opinion, preferences, and sentiments, enabling better-informed decisions and targeted actions.

Methodology

  1. Data Collection The project began by collecting a diverse dataset of Kurdish text from sources such as social media, forums, news articles, and customer feedback platforms.
  2. Preprocessing The dataset was cleaned by removing irrelevant text, handling linguistic nuances (such as dialect variations), and standardizing Kurdish script.
  3. Training Model A machine learning model, leveraging natural language processing (NLP) techniques, was trained on this preprocessed dataset. The model applied techniques such as word embeddings (like FastText) and supervised learning to classify sentiments.
  4. Evaluation The performance of the model was evaluated based on metrics like accuracy, precision, recall, and F1-score. Cross-validation was employed to ensure robustness and minimize overfitting.

Dependencies

To run this project, you’ll need a few essential Python libraries:

Conclusion

The Kurdish Sentiment Analysis project opens the door to a wide range of applications from social media monitoring and customer feedback analysis to political insights and market research.

By focusing on Kurdish, this project helps close the gap in NLP tools for underrepresented languages, ensuring that millions of Kurdish speakers are not left out of the AI revolution.

While built with Kurdish in mind, the methods and techniques developed here can easily be adapted to other languages, making this project a versatile and valuable foundation for future sentiment analysis work.

The source code can be found on my GitHub