Prerequisites:
1. You should have the valid AWS account with Sagemaker access.
2. You AWS region should have Sagemaker Studio access.
Introduction
In any machine learning use cases, result needs to be factual, based on evidence, and accurate. We expect the same for generative AI models. However, generative AI models and large language models often suffer from hallucination. Hallucination is when a model completely makes up a response that is not based on any relevant information from the training data. A hallucinated response from a model may make you think it is a well thought out response though it is completely nonsensical. It is dangerous for us to rely on the model in business critical applications for any given domain.
There are ways to overcome hallucination. Prompt engineering, retrieval augmented generation and domain adaptation of a model with fine tuning are three of the popular approaches. These approaches help you either interact with the model more specifically to avoid ambiguity, provide a generative AI model with proper context to work with or make the model learn a new domain. Retrieval augmented generation (RAG) is an approach to aid a generative AI model with factual documents and information as context to work with to reduce the possibility of hallucination due to lack of knowledge.
But what if you cannot provide all the required information within the token limit of a model? Or what if you need to search for the relevant documents first before identifying the relevant response?
Retrieval augmented generation is a design pattern for Question and Answering systems to aid a generative AI model with factual documents and information as additional context. The information can be retrieved from enterprise search systems or local databases or even public search engines.
In this lab we show you how you can build a domain specific search application that allows users to interact with and ask domain questions against a generative AI model augmented with factual information from domain text.
Key Components
LLM (Large Language Model): Mistral-7b-instruct available through Amazon SageMaker. This model will be used to understand the document chunks and provide an answer in a human friendly manner.
Embeddings Model: GPT-J 6B available through Amazon SageMaker. This model will be used to generate a numerical representation of the textual documents.
Vector Store: FAISS available through LangChain. In this notebook we are using this in-memory vector-store to store both the embeddings and the documents. In an enterprise context this could be replaced with a persistent store such as AWS OpenSearch, RDS Postgres with pgVector, ChromaDB, Pinecone or Weaviate.
Index: VectorIndex The index helps to compare the input embedding and the document embeddings to find relevant document
Ā
Step 1: Deploying Embedding and LLM models via SageMaker's Jumpstart
Deploying Embedding GPTJ 6B Embedding Model via Jumpstart
- If it is not already open, go toĀ Amazon SageMaker StudioĀ Ā and click the buttonĀ Open Studio.
- Change the region from the top, on which you should have Sagemaker Studio enabled.
- Select the user profile and Click on Open Studio.
- Once, click on theĀ Studio ClassicĀ Icon on the top left.
ā¢ Next, click on theĀ RunĀ Icon to activate classic studio. This will take a few minutes.
ā¢ Next, click on theĀ OpenĀ Icon to open classic studio. A new tab will open.
ā¢ Inside the classic studio tab, navigate toĀ HomeĀ tab and click onĀ Jumpstart
ā¢ Search forĀ GPT-JĀ and click onĀ GPT-J 6B Embedding FP16
- Under Deployment Configuration, selectĀ ml.g5.2xlargeĀ for the SageMaker hosting instance
- Leave the rest as default and clickĀ Deploy
Ā
ā¢ A deployment window will open. Deployment takes about 5-10 minutes. Once deployed, it will reflect asĀ "In Service". You can continue to the next step while waiting for JumpStart deployment to complete.
Ā
Ā
Deploying Mistral 7B Instruct Large Language Model(LLM) via Jumpstart
- Click onĀ SageMaker JumpStartĀ tab
- Search forĀ instructĀ and click onĀ Mistral 7B Instruct
- Under Deployment Configuration, selectĀ ml.g5.2xlargeĀ for the SageMaker hosting instance
- Leave the rest as default, and clickĀ Deploy
Ā
- A deployment window will open, deployment takes about 5-10 minutes.. Once deployed, it'll reflect asĀ "In Service"
Ā
You will have your GPT-J endpoint and Mistral-7B LLM endpoint ready. Please note them and keep handy for upcoming task.
Ā
Step 2 : Use SageMaker studio to run RAG application
In the next blog, we will see how we can use SageMaker Studio to run the RAG enabled question and answers.
Ā