RAG – A Quick Example | Jesse Liberty

In the previous blog post, we imported a few Python modules and configured our AI key, using Colab.

In this blog post we’ll use Retrieval-Augmented Generation (RAG) to extend an LLM that we’ll get from OpenAI. I’ll use a number of features from the libraries we imported with only a cursory explanation and will come back to them in upcoming blog posts to examine them in more depth. But I want to get to RAG right away because it is rapidly becoming central to AI and because it is cool.

LLMs are incredibly expensive to create and train, and it isn’t feasible to train them on everything. Besides that, much data is proprietary. It may be that you want an LLM that handles (to use the canonical case) your HR policies. Clearly no commercial LLM knows about those policies, nor should they. And equally clearly, you’re not going to train an LLM from scratch. What you want to do is to combine your own corpus of data (HR policy papers, etc.) with an existing LLM, and that is exactly what RAG is for.

In this simple example, we’re going to take a scene or two from Romeo and Juliet and feed it to gpt-40-mini; one of many LLMs available for use at minimal cost (we’ll get into how cost is computed in an upcoming post).

The first thing we’ll do after configuring the OPEN_API_KEY will be to get a TextLoader to import the text file with the scenes from Romeo and Juliet

RomeoAndJuliet Download

To do that, we’ll use the TextLoader from langchain_community.document_loaders (again, we’ll examine this and the other referenced modules in upcoming blog posts). We do this in three steps:

load the import statement
Point the TextLoader to our file
Load the file

from langchain_community.document_loaders import TextLoader
loader = TextLoader("RomeoAndJuliet.txt", encoding="utf-8")
docs = loader.load()

Next, we need to divide the text into chunks that the LLM can work with. We do that with a RecursiveCharacterTextSplitter from langchain. We’ll use the cl100k_base encoder, and we’ll set the chunk_size to 1000 (that is 1,000 of those mysterious tokens that, e.g., words are divided into). To ensure that nothing is dropped, we set a property, chunk_overlap to 200.

from langchain.text_splitter import RecursiveCharacterTextSplitter 
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=1000,
    chunk_overlap=200
)
chunks = loader.load_and_split(text_splitter)

In this particular example, we get six chunks,

len(chunks)
6

By now you are getting annoyed that so much is going by that I’m not explaining. As promised, however, all will be clarified in the next blog post. In fact, we’ll go back through this line by line and explain what each step is doing. But for now, let’s continue…

We need an embedding model which we’ll use to create our vector store (the place we hold onto our chunks) As an aside, the other things held in the vector store are the metadata about each chunk and vectors which are numerical embeddings of the chunks. The vectors are actually the key part of this, they are a long list of numbers that represent the semantic meaning of each chunk which can be used, and will be used below, in a similarity search.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")
vectorstore = Chroma.from_documents(
    chunks,
    embedding_model,
    collection_name="RomeoAndJuliet"
)

Now that we have the vector store, we need a way to conduct the search, for which we need a retriever. When we instantiate it, we’ll tell the retriever to use a similarity search.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 10}
)

The kwargs are additional keyword-arguments you can pass in. In this case, we’re telling the retriever to get 10 results.

Now! We are ready to create our user message and our retrieval Query. A retrieval query is the search query you send to your vector store to fetch relevant chunks. The user message is a natural-language question—it’s the optimized text you give to the retriever so it can find the right documents.

userMessage = "Give me every line having the word swear in it" 
retrievalQuery = "the play 'Romeo and Juliet'"

We can now extract the relevant chunks, iterate through them and create a long string of the resulting context chunks.

relevantChunks = retriever.invoke(retrievalQuery)
contextChunks = [d.page_content for d in relevantChunks]
contextString = ". ".join(contextChunks)

We need to give the LLM context to work from. One great way to do that is to assign a role to the LLM (e.g., “you are a human resource assistant”). In this case, we’ll use a reviewer who knows about plays. This is also a good place to provide explicit directions on how you want the LLM to respond.

qna_system_message = """
You are a play reviewer using the RAG to combine the text of the play with your knowledge of plays in general.
You will review RomeoAndJuliet.txt and provide appropriate answers from the context.
The user input will have the context required by you and will begin with the token: ###Context.
The user questions will begin with the token: ###Question.
Please answer only using the context provided and do not mention anything about the context in your answer.
If the answer is not found in the context, respond "I don't know."
"""

We just need a way to tell the LLM how the context and question will appear, for which we create a template.

qna_user_message_template = """
###Context
{context}

###Question
{question}
"""

Let’s create the final userQuery by combining the context with the user message we created above.

userQuery = qna_user_message_template.format(
    context=contextString,
    question=userMessage
)

Finally, we’re ready to create the prompt that we’ll feed to the LLM

prompt = f"""
[INST]{qna_system_message}

{userQuery}
[/INST]
"""

Next, we instantiate our LLM filling in some parameters that, again, we’ll review in an upcoming blog post

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",                      
    temperature=0,                
    max_tokens=10000,                 
    top_p=0.95,
    frequency_penalty=1.2,
    stop_sequences=['INST']
)

And we are now, at last, ready to feed our prompt to the LLM which will incorporate the RAG we created from the Romeo and Juliet text. Remember that we asked it to give us the lines with the word swear in it

response = llm.invoke(prompt)
response.content

ROMEO.  
O, then, dear saint, let lips do what hands do:  
They pray, grant thou, lest faith turn to despair.  

JULIET.  
Saints do not move, though grant for prayers’ sake.  

ROMEO.  
Then move not while my prayer’s effect I take.  
Thus from my lips, by thine my sin is purg’d.

JULIET.  
O swear not by the moon, th’inconstant moon,  
That monthly changes in her circled orb,  
Lest that thy love prove likewise variable.

ROMEO.  
What shall I swear by?

JULIET.   
Do not swear at all.   
Or if thou wilt, swear by thy gracious self,
Which is the god of my idolatry,
And I’ll believe thee.

ROMEO.
If my heart’s dear love,—

One thing to note is that the LLM interpreted “swear” liberally. For example, in the first verse Romeo says “They pray,” which is pretty close to “swear.”

We need a method to ask more questions. Let’s create a method, that takes a user message, chunks it, creates the prompt and invokes the LLM with that prompt

def UseRag(userMessage):
    """
    Args:
    userMessage: Takes a user input for which the response should be retrieved from the vectorDB.
    Returns:
    relevant context as per user query.
    """
    chunks = retriever.invoke(userMessage)
    contextContent = [d.page_content for d in chunks]
    contextString = ". ".join(contextContent)

    prompt = f"""[INST]{qna_system_message}\n
                {'user'}: {qna_user_message_template.format(context=contextString, question=userMessage)}
                [/INST]"""

    # Quering the LLM
    try:
        response = llm.invoke(prompt)

    except Exception as e:
        response = f'Sorry, I encountered the following error: \n {e}'

    return response.content

To prove that we’re getting our answers from the RAG, let’s ask a question about text that is not in our excerpt but that would be known by anyone (anything?) that is familiar with the play.

print(UseRag("What town does Romeo live in?"))

I don't know.

Finally, let’s have a bit of fun,

UseRag("Write a 10 line poem in the style of 'Romeo and Juliet'")

In shadows deep where whispered secrets lie,  
Two hearts entwined beneath the moonlit sky.  
A glance exchanged, a spark ignites the night,  
Forbidden love that dances out of sight.  

O sweet Juliet, with beauty rare and bright,  
Your name a curse yet brings my soul delight.  
Though feuding kin may seek to tear apart,  
Our love shall bloom within each beating heart.  

For in this world of strife and bitter woe,  
Together we shall rise; our passion's glow.

I think that is actually pretty good.

OK, that was a lot, and it went by fast. I look forward to going back through it, line by line, and exploring what each line is doing.