16.8 C
London
Friday, September 20, 2024

Revolutionizing Protein Language Processing: High-Performance Implementation of Accelerated ProtST on Intel Gaudi 2 AI Chip

Here is the rewritten HTML article:

Inference and Fine-tuning ProtST on Intel Gaudi 2 Accelerator

Protein Language Models (PLMs) have emerged as a powerful tool for predicting and designing protein structure and function. At the International Conference on Machine Learning 2023 (ICML), MILA and Intel Labs released ProtST, a pioneering multi-modal language model for protein design based on text prompts. In this blog post, we demonstrate the ease of deploying ProtST inference and fine-tuning on Intel Gaudi 2 accelerator.

Inference with ProtST

Common subcellular locations include the nucleus, cell membrane, cytoplasm, mitochondria, and others as described in this dataset in greater detail.

We compare ProtST’s inference performance on NVIDIA A100 80GB PCIe and Gaudi 2 accelerator using the test split of the ProtST-SubcellularLocalization dataset. This test set contains 2772 amino acid sequences, with variable sequence lengths ranging from 79 to 1999.

You can reproduce our experiment using this script, where we run the model in full bfloat16 precision with batch size 1. We get an identical accuracy of 0.44 on the Nvidia A100 and Intel Gaudi 2, with Gaudi2 delivering 1.76x faster inferencing speed than the A100. The wall time for a single A100 and a single Gaudi 2 is shown in the figure below.

Fine-tuning ProtST

Fine-tuning the ProtST model on downstream tasks is an easy and established way to improve modeling accuracy. In this experiment, we specialize the model for binary location, a simpler version of subcellular localization, with binary labels indicating whether a protein is membrane-bound or soluble.

You can reproduce our experiment using this script. Here, we fine-tune the ProtST-ESM1b-for-sequential-classification model in bfloat16 precision on the ProtST-BinaryLocalization dataset. The table below shows model accuracy on the test split with different training hardware setups, and they closely match the results published in the paper (around 92.5% accuracy).


Conclusion

In this blog post, we have demonstrated the ease of deploying ProtST inference and fine-tuning on Gaudi 2 based on Optimum for Intel Gaudi Accelerators. In addition, our results show competitive performance against A100, with a 1.76x speedup for inference and a 2.92x speedup for fine-tuning.

Frequently Asked Questions

Q1: What is ProtST?

ProtST is a pioneering multi-modal language model for protein design based on text prompts.

Q2: What is Gaudi 2 accelerator?

The Gaudi 2 is the second-generation AI accelerator that Intel designed, used for accelerating machine learning models.

Q3: Can I fine-tune ProtST on my own data?

Yes, you can fine-tune ProtST on your own data by following the script provided.

Q4: How can I deploy ProtST on Intel Gaudi 2 accelerator?

You can deploy ProtST on Intel Gaudi 2 accelerator by using Optimum for Intel Gaudi Accelerators and following the instructions provided.

Q5: What are the results of fine-tuning ProtST on Intel Gaudi 2 accelerator?

The results show competitive performance against A100, with a 1.76x speedup for inference and a 2.92x speedup for fine-tuning.

Latest news
Related news
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x