Text Classification with IndoLU Dataset Using TF-IDF and Machine Learning

Introduction

Text classification is a fundamental task in natural language processing (NLP) that involves categorizing text into predefined categories. In this project, I utilized the IndoLU dataset and employed TF-IDF vectorization along with various machine learning algorithms to perform text classification.

Project Description

The aim of this project is to develop a text classification model capable of accurately categorizing Indonesian text data. The IndoLU (Indonesian Language Understanding) dataset was used, and TF-IDF (Term Frequency-Inverse Document Frequency) was applied for text vectorization. Various machine learning models were then trained and evaluated to identify the best-performing model.

Dataset

The IndoLU dataset contains a collection of Indonesian texts labeled with different categories. It is a comprehensive dataset that provides a robust foundation for developing and testing text classification models.