Meta Llama 3.1 on Vertex AI: A Step-by-Step Guide
Introduction
Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. This blog post will guide you through the process of deploying Meta Llama 3.1 405B Instruct FP8 on Google Cloud Vertex AI, running online predictions, and cleaning up resources.
Deploying Meta Llama 3.1 405B Instruct FP8 on Vertex AI
To deploy Meta Llama 3.1 405B Instruct FP8 on Vertex AI, you need to follow these steps:
- Requirements for Meta Llama 3.1 Models on Google Cloud
Meta Llama 3.1 models require careful consideration of your hardware resources. For inference, the memory requirements depend on the model size and the precision of the weights. Here’s a table showing the approximate memory needed for different configurations:
Model Size | FP16 | FP8 | INT4 |
---|---|---|---|
8B | 16 GB | 8 GB | 4 GB |
70B | 140 GB | 70 GB | 35 GB |
405B | 810 GB | 405 GB | 203 GB |
- Setup Google Cloud for Vertex AI
To set up Google Cloud for Vertex AI, you need to initialize the project and location. You can do this by running the following code:from google.cloud import aiplatform
aiplatform.init(project=os.getenv("PROJECT_ID"), location=os.getenv("LOCATION"))
3. **Register the Meta Llama 3.1 405B Model on Vertex AI**
To register the Meta Llama 3.1 405B model on Vertex AI, you need to create a new model resource. You can do this by running the following code:
```python
from google.cloud import aiplatform
model = aiplatform.Model(
display_name="Meta Llama 3.1 405B Instruct FP8",
model_resource_name="projects/{}/locations/{}/models/{}".format(
os.getenv("PROJECT_ID"),
os.getenv("LOCATION"),
"meta-llama-3-1-405b-instruct-fp8",
),
model_version_id="1",
)
- Deploy Meta Llama 3.1 405B on Vertex AI
To deploy Meta Llama 3.1 405B on Vertex AI, you need to create a new endpoint and deploy the model to it. You can do this by running the following code:from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(
f"projects/{os.getenv(‘PROJECT_ID’)}/locations/{os.getenv(‘LOCATION’)}/endpoints/{ENDPOINT_ID}"
)
output = endpoint.predict(
instances=[
{
"inputs": inputs,
"parameters": {
"max_new_tokens": 128,
"do_sample": True,
"top_p": 0.95,
"temperature": 0.7,
},
},
],
)
5. **Run Online Predictions with Meta Llama 3.1 405B**
To run online predictions with Meta Llama 3.1 405B, you can use the Vertex AI Online Prediction UI or programmatically using the Google Cloud Client Library. Here's an example of how to do it programmatically:
```python
from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(
f"projects/{os.getenv('PROJECT_ID')}/locations/{os.getenv('LOCATION')}/endpoints/{ENDPOINT_ID}"
)
output = endpoint.predict(
instances=[
{
"inputs": inputs,
"parameters": {
"max_new_tokens": 128,
"do_sample": True,
"top_p": 0.95,
"temperature": 0.7,
},
},
],
)
Conclusion
That’s it! You have successfully deployed Meta Llama 3.1 405B Instruct FP8 on Google Cloud Vertex AI, run online predictions, and cleaned up resources. Thanks to the Hugging Face DLCs for Text Generation Inference (TGI) and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier.
Frequently Asked Questions
Q1: What is Meta Llama 3.1?
Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024.
Q2: What are the requirements for Meta Llama 3.1 models on Google Cloud?
Meta Llama 3.1 models require careful consideration of your hardware resources. For inference, the memory requirements depend on the model size and the precision of the weights.
Q3: How do I set up Google Cloud for Vertex AI?
To set up Google Cloud for Vertex AI, you need to initialize the project and location. You can do this by running the following code:
from google.cloud import aiplatform
aiplatform.init(project=os.getenv("PROJECT_ID"), location=os.getenv("LOCATION"))
Q4: How do I register the Meta Llama 3.1 405B model on Vertex AI?
To register the Meta Llama 3.1 405B model on Vertex AI, you need to create a new model resource. You can do this by running the following code:
from google.cloud import aiplatform
model = aiplatform.Model(
display_name="Meta Llama 3.1 405B Instruct FP8",
model_resource_name="projects/{}/locations/{}/models/{}".format(
os.getenv("PROJECT_ID"),
os.getenv("LOCATION"),
"meta-llama-3-1-405b-instruct-fp8",
),
model_version_id="1",
)
Q5: How do I deploy Meta Llama 3.1 405B on Vertex AI?
To deploy Meta Llama 3.1 405B on Vertex AI, you need to create a new endpoint and deploy the model to it. You can do this by running the following code:
from google.cloud import aiplatform
endpoint = aiplatform.Endpoint(
f"projects/{os.getenv('PROJECT_ID')}/locations/{os.getenv('LOCATION')}/endpoints/{ENDPOINT_ID}"
)
output = endpoint.predict(
instances=[
{
"inputs": inputs,
"parameters": {
"max_new_tokens": 128,
"do_sample": True,
"top_p": 0.95,
"temperature": 0.7,
},
},
],
)