Skip to content

Automating the deployment of the Takeoff Server on AWS for LLMs

License

Notifications You must be signed in to change notification settings

InquestGeronimo/horizon-takeoff

Repository files navigation

Horizon Takeoff

Horizon Takeoff is a Python library for simplifying the cloud deployment of LLMs with TitanML's Takeoff Server on AWS. The deployment process is facilitated through an interactive Terminal User Interface (TUI) for streamlining the configuration of your cloud environment. To gain a deeper understanding of the features offered by the Takeoff Server, refer to TitanML's documentation.

With Horizon-Takeoff, you have the flexibility to choose between two distinct workflows:

1. Terminal User Interface (TUI): This approach guides you through a step-by-step process within the terminal. This procedure automatically saves your cloud environment settings in a YAML file and handles cloud orchestration tasks such as handling of the Takeoff Server image to AWS's Elastic Container Registry (ECR), initiating the instance launch and Takeoff Server configuration for LLM inference.

2. Python API: Alternatively, you can can manually create the YAML config file according to your specific requirements and execute the orchestration and instance launch in Python. Further details found in the YAML Configuration section.

Requirements

1. AWS CLI installed and configured on local machine.
2. Docker installed.
3. Own an AWS account with the following configurations:

  • Have an instance profile role with access to AmazonEC2ContainerRegistryReadOnly. This will allow access to Docker pulls from ECR within an instance.

  • Own a security group allowing inbound traffic to port: 8000 (required for Takeoff Server community edition) and port: 3000 (required for Takeoff Server pro edition). This will expose the appropriate Docker endpoints for API calling depending on your server edition of choice.

Currently, only EC2 instance deployment on the Community edition server is stable.

Install


pip install horizon-takeoff

TUI Launch


Launch the TUI for configuring an EC2 instance with the community version of the Takeoff Server:

horizon-takeoff ec2 community

The TUI not only features a clean user interface (powered by the rich library) but also conducts pre-start checks for AWS CLI and Docker installations. Furthermore, it has access to your AWS profile data, allowing it to significantly expedite the configuration of your AWS cloud environment. This includes the ability to list your available AWS keys, security groups, and ARN roles.

tui.mp4

Staging


After you've finished the TUI workflow, a YAML configuration file will be automatically stored in your working directory. This file will trigger the staging process of your deployment and you will receive a notification in terminal of your instance launch.

Wait a few minutes as the instance downloads the LLM model and initiates the Docker container containing the Takeoff Server. To keep track of the progress and access your instance's initialization logs, you can SSH into your instance:

ssh -i ~/<pem.key> <user>@<public-ipv4-dns>  # e.g. ssh -i ~/aws.pem [email protected]

In your instance's terminal, run the following command to view your logs to confirm when your container is up and running:

cat /var/log/cloud-init-output.log

If you observe the Uvicorn URL endpoint being displayed, it signifies that your Docker container is operational and you are now ready to invoke API calls to the inference endpoint.

Calling the Endpoint


Once you've initialized the EC2Endpoint class, you can effortlessly invoke your LLM in the cloud with just a single line of code.

from horizon import EC2Endpoint

endpoint = EC2Endpoint()
generation = endpoint('List 3 things to do in London.')
print(generation)

You can pass generative arguments to the EC2Endpoint() class in order to shape your model's output and/or choose server edition and endpoint type:

pro: bool = False,
stream: bool = False,
sampling_topk: int = 1,
sampling_topp: float = 1.0,
sampling_temperature: float = 1.0,
repetition_penalty: int = 1,
no_repeat_ngram_size: int = 0,

For more information regarding the available decoding arguments, refer to TitanML's docs.

Deleting Instance


To delete your working instance via the terminal, run:

horizon-del

YAML Configuration


If you prefer to bypass the TUI, you can enter your YAML configuration manually. Make sure to add the following EC2-related variables and save them in a ec2_config.yaml file:

EC2:
  ami_id: ami-0c7217cdde317cfec             # Set the ID of the Amazon Machine Image (AMI) to use for EC2 instances.
  ecr_repo_name: takeoff                    # Set the name of the ECR repository. If it doesn't exist it will be created.
  hardware: cpu                             # Set the hardware type: 'cpu' or 'gpu'
  hf_model_name: tiiuae/falcon-7b-instruct  # Set the name of the Hugging Face model to use.
  instance_role_arn: arn:aws:iam::^^^:path  # Set the ARN of the IAM instance profile role.
  instance_type: c5.2xlarge                 # Set the EC2 instance type.
  key_name: aws                             # Set the name of the AWS key pair.
  region_name: us-east-1                    # Set the AWS region name.
  security_group_ids:                       # Set the security group ID(s).
    - sg-0fefe7b366b0c0843
  server_edition: community                 # defaults to "community" ("pro" not available yet)                

Launch in Python


Upon configuring the YAML file, instantiate the DockerHandler and TitanEC2 classes to handle Docker image flows and instance launch.

Docker

Load the YAML file into the DockerHandler. These commands will pull the Takeoff Docker image, tag it, and push it to ECR:

from horizon import DockerHandler, TitanEC2

docker = DockerHandler("ec2_config.yaml")

docker.pull_takeoff_image()
docker.push_takeoff_image()

Create Instance

Launch the EC2 instance:

titan = TitanEC2("ec2_config.yaml")
instance_id, meta_data = titan.create_instance()
print(meta_data)

When you instance is created, you will get a JSON output of the instance's meta data.

Revisit the Staging and Calling the Inference Endpoint section for API handling.

Delete Instance

Pass your Instance Id to the delete_instance method:

titan.delete_instance(instance_id)