Skip to content

aws-samples/generative-bi-using-rag

Generative BI using RAG on AWS

中文文档 | 日本語ドキュメント

Introduction

A NLQ(Natural Language Query) demo using Amazon Bedrock, Amazon OpenSearch with RAG technique.

Screenshot

User Operation Manual

Project Data Flowchart

Table of Content

  1. Overview
  2. Prerequisites
  3. Deployment Steps
  4. Deployment Validation
  5. Running the Guidance
  6. Next Steps
  7. Cleanup

Overview

This is a comprehensive framework designed to enable Generative BI capabilities on customized data sources (RDS/Redshift) hosted on AWS. It offers the following key features:

  • Text-to-SQL functionality for querying customized data sources using natural language.
  • User-friendly interface for adding, editing, and managing data sources, tables, and column descriptions.
  • Performance enhancement through the integration of historical question-answer ranking and entity recognition.
  • Customize business information, including entity information, formulas, SQL samples, and analysis ideas for complex business problems.
  • Add agent task splitting function to handle complex attribution analysis problems.
  • Intuitive question-answering UI that provides insights into the underlying Text-to-SQL mechanism.
  • Simple agent design interface for handling complex queries through a conversational approach.

Cost

As of May, 2024, the cost for running this Guidance with the default settings in the us-west-2 is approximately $1337.8 per month for processing 2000 requests.

Sample Cost Table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS service Dimensions Cost [USD] per Month
Amazon ECS v0.75 CPU 5GB $804.1
Amazon DynamoDB 25 provisioned write & read capacity units per month $ 14.04
Amazon Bedrock 2000 requests per month, with each request consuming 10000 input tokens and 1000 output tokens $ 416.00
Amazon OpenSearch Service 1 domain with m5.large.search $ 103.66

Prerequisites

Operating System

“CDK are optimized to best work to be initiated on <Amazon Linux 2023 AMI>. Deployment in another OS may require additional steps.”

AWS account requirements

  • VPC
  • IAM role with specific permissions
  • Amazon Bedrock
  • Amazon ECS
  • Amazon DynamoDB
  • Amazon Cognito
  • Amazon OpenSearch Service
  • Amazon Elastic Load Balancing
  • Amazon SageMaker (Optional, if you need customized models to be deployed)
  • Amazon Secrets Manager

Supported Regions

us-west-2, us-east-2, us-east-1, ap-south-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, eu-central-1, eu-west-1, eu-west-3, or any other region that supports the services (bedrock) used in the Guidance.

Deployment Steps

1. Prepare CDK Pre-requisites

Please follow the instructions in the CDK Workshop to install the CDK toolkit. Make sure your environment have the authorization to create the resources.

2. Set a password for the Streamlit Web UI

The default password is [Empty] for Streamlit Web UI. If you need to set a password for the Streamlit Web UI, you can update the password in the application/config_files/stauth_config.yaml

for example

credentials:
  usernames:
    jsmith:
      email: [email protected]
      name: John Smith
      password: XXXXXX # To be replaced with hashed password
    rbriggs:
      email: [email protected]
      name: Rebecca Briggs
      password: XXXXXX # To be replaced with hashed password
cookie:
  expiry_days: 30
  key: random_signature_key # Must be string
  name: random_cookie_name
preauthorized:
  emails:
  - [email protected]

change the password 'XXXXXX' to hashed password

Use the python code below to generate XXXXXX. We need python 3.8 and up to run the code below:

from streamlit_authenticator.utilities.hasher import Hasher
hashed_passwords = Hasher(['password123']).generate()

3. Deploy the CDK Stack

For global regions, execute the following commands:

Navigate to the CDK project directory:

cd generative-bi-using-rag/source/resources

Deploy the CDK stack, change the region to your own region if needed, for example, us-west-2, us-east-1, etc.:

export AWS_DEFAULT_REGION=us-west-1
cdk bootstrap
cdk deploy GenBiMainStack --require-approval never

You will see the following when deployed succeeded

GenBiMainStack.AOSDomainEndpoint = XXXXX.us-west-2.es.amazonaws.com
GenBiMainStack.APIEndpoint = XXXXX.us-west-2.elb.amazonaws.com
GenBiMainStack.FrontendEndpoint = XXXXX.us-west-2.elb.amazonaws.com
GenBiMainStack.StreamlitEndpoint = XXXXX.us-west-2.elb.amazonaws.com

Running the Guidance

After the CDK stack is deployed, wait around 40 minutes for the initialization to complete. Then, open the Web UI in your browser: https://your-public-dns

Cleanup

  • Delete the CDK stack:
cdk destroy GenBiMainStack