Classification

Creating Classification Model

Sample Classification Model

Sklearn is one option for creating a classification model, but there are many other options — LGBM :D

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data into a pandas DataFrame
data = pd.read_csv('path/to/data.csv')

# Split the data into training and testing sets
train_data, test_data, train_labels, test_labels = train_test_split(
    data['text'], data['label'], test_size=0.2, random_state=42)

# Create the TF-IDF vectorizer
tfidf = TfidfVectorizer()

# Fit the vectorizer to the training data and transform the training data
train_tfidf = tfidf.fit_transform(train_data)

# Transform the testing data using the fitted vectorizer
test_tfidf = tfidf.transform(test_data)

# Create the Logistic Regression model
clf = LogisticRegression(random_state=42)

# Train the model on the training data
clf.fit(train_tfidf, train_labels)

# Predict labels for the testing data
predictions = clf.predict(test_tfidf)

# Evaluate the accuracy of the model
accuracy = accuracy_score(test_labels, predictions)
print(f'Accuracy: {accuracy:.4f}')

For NLP purposes - you can use the Hugging Face library. It has a lot of pre-trained models that you can use for your initial training labels.

From a large pretrain model you can opt for more accuracy by fine-tuning the model to your specific use case or using a smaller classification model.

reference material

  • https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks