Building a RAG Application using Amazon SageMaker Jumpstart

💡

Prerequisites: 1. You should have the valid AWS account with Sagemaker access. 2. You AWS region should have Sagemaker Studio access.

Introduction

In any machine learning use cases, result needs to be factual, based on evidence, and accurate. We expect the same for generative AI models. However, generative AI models and large language models often suffer from hallucination. Hallucination is when a model completely makes up a response that is not based on any relevant information from the training data. A hallucinated response from a model may make you think it is a well thought out response though it is completely nonsensical. It is dangerous for us to rely on the model in business critical applications for any given domain.

There are ways to overcome hallucination. Prompt engineering, retrieval augmented generation and domain adaptation of a model with fine tuning are three of the popular approaches. These approaches help you either interact with the model more specifically to avoid ambiguity, provide a generative AI model with proper context to work with or make the model learn a new domain. Retrieval augmented generation (RAG) is an approach to aid a generative AI model with factual documents and information as context to work with to reduce the possibility of hallucination due to lack of knowledge.

But what if you cannot provide all the required information within the token limit of a model? Or what if you need to search for the relevant documents first before identifying the relevant response?

Retrieval augmented generation is a design pattern for Question and Answering systems to aid a generative AI model with factual documents and information as additional context. The information can be retrieved from enterprise search systems or local databases or even public search engines.

In this lab we show you how you can build a domain specific search application that allows users to interact with and ask domain questions against a generative AI model augmented with factual information from domain text.

Key Components

LLM (Large Language Model): Mistral-7b-instruct available through Amazon SageMaker. This model will be used to understand the document chunks and provide an answer in a human friendly manner.

Embeddings Model: GPT-J 6B available through Amazon SageMaker. This model will be used to generate a numerical representation of the textual documents.

Vector Store: FAISS available through LangChain. In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.

Index: VectorIndex The index helps to compare the input embedding and the document embeddings to find relevant document

Step 1: Deploying Embedding and LLM models via SageMaker's Jumpstart

Deploying Embedding GPTJ 6B Embedding Model via Jumpstart

If it is not already open, go to Amazon SageMaker Studio and click the button Open Studio.

Change the region from the top, on which you should have Sagemaker Studio enabled.

Select the user profile and Click on Open Studio.

Once, click on the Studio Classic Icon on the top left.

• Next, click on the Run Icon to activate classic studio. This will take a few minutes.

• Next, click on the Open Icon to open classic studio. A new tab will open.

• Inside the classic studio tab, navigate to Home tab and click on Jumpstart

• Search for GPT-J and click on GPT-J 6B Embedding FP16

Under Deployment Configuration, select ml.g5.2xlarge for the SageMaker hosting instance

Leave the rest as default and click Deploy

• A deployment window will open. Deployment takes about 5-10 minutes. Once deployed, it will reflect as "In Service". You can continue to the next step while waiting for JumpStart deployment to complete.

Deploying Mistral 7B Instruct Large Language Model(LLM) via Jumpstart

Click on SageMaker JumpStart tab

Search for instruct and click on Mistral 7B Instruct

Under Deployment Configuration, select ml.g5.2xlarge for the SageMaker hosting instance

Leave the rest as default, and click Deploy

A deployment window will open, deployment takes about 5-10 minutes.. Once deployed, it'll reflect as "In Service"

You will have your GPT-J endpoint and Mistral-7B LLM endpoint ready. Please note them and keep handy for upcoming task.

Step 2 : Use SageMaker studio to run RAG application

In the next blog, we will see how we can use SageMaker Studio to run the RAG enabled question and answers.