Building a RAG Application using Amazon SageMaker Jumpstart
Building a RAG Application using Amazon SageMaker Jumpstart

Building a RAG Application using Amazon SageMaker Jumpstart

Author
Created
Jun 19, 2024
Tags
Sagemaker
RAG
Embedding
LLM
Text
A guide to start your RAG application journey using Amazon SageMaker jumpstart.

šŸ’”
Prerequisites: 1. You should have the valid AWS account with Sagemaker access. 2. You AWS region should have Sagemaker Studio access.

Introduction

In any machine learning use cases, result needs to be factual, based on evidence, and accurate. We expect the same for generative AI models. However, generative AI models and large language models often suffer from hallucination. Hallucination is when a model completely makes up a response that is not based on any relevant information from the training data. A hallucinated response from a model may make you think it is a well thought out response though it is completely nonsensical. It is dangerous for us to rely on the model in business critical applications for any given domain.
There are ways to overcome hallucination. Prompt engineering, retrieval augmented generation and domain adaptation of a model with fine tuning are three of the popular approaches. These approaches help you either interact with the model more specifically to avoid ambiguity, provide a generative AI model with proper context to work with or make the model learn a new domain. Retrieval augmented generation (RAG) is an approach to aid a generative AI model with factual documents and information as context to work with to reduce the possibility of hallucination due to lack of knowledge.
But what if you cannot provide all the required information within the token limit of a model? Or what if you need to search for the relevant documents first before identifying the relevant response?
Retrieval augmented generation is a design pattern for Question and Answering systems to aid a generative AI model with factual documents and information as additional context. The information can be retrieved from enterprise search systems or local databases or even public search engines.
In this lab we show you how you can build a domain specific search application that allows users to interact with and ask domain questions against a generative AI model augmented with factual information from domain text.

Key Components

LLM (Large Language Model): Mistral-7b-instruct available through Amazon SageMaker. This model will be used to understand the document chunks and provide an answer in a human friendly manner.
Embeddings Model: GPT-J 6B available through Amazon SageMaker. This model will be used to generate a numerical representation of the textual documents.
Vector Store: FAISS available through LangChain. In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.
Index: VectorIndex The index helps to compare the input embedding and the document embeddings to find relevant document
notion image
Ā 

Step 1: Deploying Embedding and LLM models via SageMaker's Jumpstart

Deploying Embedding GPTJ 6B Embedding Model via Jumpstart

  • Change the region from the top, on which you should have Sagemaker Studio enabled.
  • Select the user profile and Click on Open Studio.
notion image
  • Once, click on theĀ Studio ClassicĀ Icon on the top left.
notion image
ā€¢ Next, click on theĀ RunĀ Icon to activate classic studio. This will take a few minutes.
notion image
ā€¢ Next, click on theĀ OpenĀ Icon to open classic studio. A new tab will open.
notion image
ā€¢ Inside the classic studio tab, navigate toĀ HomeĀ tab and click onĀ Jumpstart
notion image
ā€¢ Search forĀ GPT-JĀ and click onĀ GPT-J 6B Embedding FP16
notion image
  • Under Deployment Configuration, selectĀ ml.g5.2xlargeĀ for the SageMaker hosting instance
  • Leave the rest as default and clickĀ Deploy
Ā 
notion image
ā€¢ A deployment window will open. Deployment takes about 5-10 minutes. Once deployed, it will reflect asĀ "In Service". You can continue to the next step while waiting for JumpStart deployment to complete.
notion image
Ā 
Ā 

Deploying Mistral 7B Instruct Large Language Model(LLM) via Jumpstart

  • Click onĀ SageMaker JumpStartĀ tab
  • Search forĀ instructĀ and click onĀ Mistral 7B Instruct
notion image
  • Under Deployment Configuration, selectĀ ml.g5.2xlargeĀ for the SageMaker hosting instance
  • Leave the rest as default, and clickĀ Deploy
Ā 
notion image
  • A deployment window will open, deployment takes about 5-10 minutes.. Once deployed, it'll reflect asĀ "In Service"
notion image
Ā 
You will have your GPT-J endpoint and Mistral-7B LLM endpoint ready. Please note them and keep handy for upcoming task.
Ā 

Step 2 : Use SageMaker studio to run RAG application

In the next blog, we will see how we can use SageMaker Studio to run the RAG enabled question and answers.
Ā