Breaking the Token Limit: How to Work with Large Amounts of Text in ChatGPT

3 min readJan 27, 2023

Have you ever wanted to use ChatGPT to help you write/review/proofread a large body of text, but were limited by the maximum number of tokens allowed? In this article, I’m going to show you how I used the OpenAI API and Python to overcome the token limit.

The inspiration for this solution came when I wanted to scan through a video transcript of a YouTube video for a project I was working on, but I quickly found out that ChatGPT couldn’t handle the word count, which was over 50,000 words. On average, 4000 tokens is around 8,000 words. This is the token limit for ChatGPT. However, I found a way to work around this limitation.

To overcome this limitation, I used a technique called “batch processing.” I broke down the script into smaller chunks of text, and then used the OpenAI API to process each batch separately. I set the batch size to 250 words, while also giving the AI 500 words of context (250 before and 250 after). I also set the max_tokens to 1000 so that GPT 3 wouldn’t randomly cut off sentences, which was an issue I ran into during early testing.

Here’s an example of the code we used:

import openai
openai.api_key = "your api key here"

# Your large text body here
script = "paste your text here"

# Setting batch size and context size
batch_size = 250

# Tokenize the script
script_tokens = script.split(" ")

for i in range(0, len(script_tokens), batch_size):
    if i < batch_size:
        before_context = ""
    else:
        before_context = " ".join(script_tokens[i-batch_size:i])
    text_to_edit = " ".join(script_tokens[i:i+batch_size])
    if i+batch_size*2 >= len(script_tokens):
        after_context = ""
    else:
        after_context = " ".join(script_tokens[i+batch_size:i+batch_size*2])
    
    prompt = f"Please proofread, rewrite, and improve of the following text inside the brackets (in the context that it is a youtube script for a narrated video), considering the context given before and after it: before:\"{before_context}\" text to edit:{text_to_edit} after:\"{after_context}\" []"

    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=0.9,
        max_tokens=1000,
        top_p=1,
        frequency_penalty=0.25,
        presence_penalty=0
    )
    # Print the response from the GPT-3 API
    print(response["choices"][0]["text"])

It’s important to note that this method has its limitations, such as that GPT-3 will not know the context of the entire story, only the small context we feed it of before and after the target text.

Also, I want to mention that this script cost around 9 dollars to run when I used it on a 50,000 word video transcript. I did use a paid account to accomplish this, but OpenAI gives you an $18 free credit when you sign up.

Breaking the Token Limit: How to Work with Large Amounts of Text in ChatGPT

Written by Do The Thing

No responses yet