Sentiment Analysis using OpenAI GPT-3 API
In this article, we will cover on how to perform sentiment analysis using OpenAI GPT-3 API and Google Colab.
Introduction
Sentiment Analysis is used to identify if a piece of text is positive, negative or neutral. Some of the use cases for Sentiment Analysis can include:
- Monitoring social media to track how people feel about a certain brand and topic.
- Quickly discover negative customer support tickets that have been submitted and address them quickly.
In this tutorial, we will be covering on how we can use the OpenAI GPT-3 API to perform sentiment analysis with minimal amount of code and setup.
Topics to be covered
In this tutorial, we will cover how we can train the OpenAI GPT-3 API model using the Sentiment140 dataset from Kaggle using Jupyter Notebook on Colab and perform a prediction.
Download Required Dependencies
!pip install kaggle pandas jsonlines openai
Download Kaggle Dataset
In order to download the Kaggle dataset directly from Jupyter Notebook, we can use the kaggle command line tool. Before being able to do so, we will need to get the Kaggle API Key.
- Go to the Kaggle User Profile page, click the profile picture and select "Your Profile".
- Click the "Account" Tab and click the "Create New API Token". Your browser will auto download the "kaggle.json" file that will contain the Kaggle credentials.
In the Google Colab notebook, upload the kaggle.json file by right clicking in the "Files" tab and clicking the "Upload" button.
Add the following piece of code into the notebook. This piece of code shall create the .kaggle folder in the current project directory and copy the kaggle.json that have been uploaded into the .kaggle folder.
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
After placing the credentials into the notebook, we can run this code to download the dataset from Google Colab.
!kaggle datasets download kazanova/sentiment140
!unzip sentiment140.zip
Transforming the dataset
Before uploading the sentiment dataset to OpenAI, we will need to transform the dataset into a jsonl file with the format:
{text:'Sentiment Text', label:'Positive|Negative|Neural'}
Read the file using pandas read_csv
import pandas as pd
emotion_df = pd.read_csv('training.1600000.processed.noemoticon.csv', names=['label', 'id', 'date', 'flag', 'user', 'text'], encoding='ISO-8859-1')
# Visualize first 5 rows
emotion_df.head()
In the current dataset, the sentiment label will be 0, 2 or 4. Negative will be label 0, Neutral will be 2 and Positive will be 4.
Run this code to transform the number label to the corresponding text label (Negative, Neutral or Positive)
def convert_labels(label):
if label == 0:
return 'Negative'
if label == 2:
return 'Neutral'
if label == 4:
return 'Positive'
emotion_df['label'] = emotion_df['label'].apply(convert_labels)
We need to convert the label text examples into jsonl format using the code below:
import jsonlines
with jsonlines.open('train.jsonl', mode='w') as writer:
for row in emotion_df.itertuples():
writer.write({
'text': row[6],
'label': row[1]
})
Get the OpenAI API Key
After successfully preparing the jsonl file that will used as the prelabelled dataset, we will need to upload the prelabelled file to OpenAI GPT-3 API using the OpenAPI API and make another call to perform classification on a text.
Register for OpenAI account
Before using the OpenAI GPT-3 API, we need to get an API key. Register for an OpenAI account https://beta.openai.com/signup.
Get the API Key
- Go to [https://beta.openai.com/account/api-keys].(https://beta.openai.com/account/api-keys)
- Copy the "Secret Key" by clicking the "Copy" button.
Upload file to OpenAI API
In the code below, we specify the OpenAI API Key and upload the file to OpenAI using openai.File.create()
method. Documentation for the file API can be found here
import openai
openai.api_key='<API_KEY>'
result = openai.File.create(file=open("train.jsonl"), purpose="classifications")
Performing classification
In order to perform the sentiment analysis, we can call the openai.Classification.create
method to successfully make a prediction.
filename = result['id']
openai.Classification.create(
search_model="ada",
model="curie",
file=filename,
query="Crypto is crashing hard",
labels=["Positive", "Negative", "Neutral"],
)
predicted_label = prediction['label']
print('Predicted label is: {}'.format(predicted_label))
Conclusion
The OpenAI GPT-3 API allows for developers to quickly build a sentiment tools to automatically identify the sentiment label for a given text. Besides that, GPT-3 API also has the ability to perform other tasks such as Question Answering and Search.
The full code on Google Colab can be found https://colab.research.google.com/drive/1svU2R5aIqjKf_7tQZrAi2Vs7vso-yqvS?usp=sharing