16.8 C
London
Thursday, September 19, 2024

XetHub Integrates with Hugging Face, Enhancing AI Model Development and Deployment

Here is the article in HTML:

Introduction

We are thrilled to announce that Hugging Face has acquired XetHub, a Seattle-based company that has been working on enabling software engineering best practices for AI development. XetHub’s mission is to enable Git to scale to TB repositories and enable teams to explore, understand, and work together on large evolving datasets and models.






Julien Chaumond's avatar

We are super excited to officially announce that Hugging Face acquired XetHub 🔥

XetHub is a Seattle-based company founded by Yucheng Low, Ajit Banerjee, Rajat Arya who previously worked at Apple where they built and scaled Apple’s internal ML infrastructure. XetHub’s mission is to enable software engineering best practices for AI development. XetHub has developed technologies to enable Git to scale to TB repositories and enable teams to explore, understand, and work together on large evolving datasets and models. They were soon joined by a talented team of 12 team members. You should give them a follow at their new org page: hf.co/xet-team



Our common goal at HF

The XetHub team will help us unlock the next 5 years of growth of HF datasets and models by switching to our own, better version of LFS as storage backend for the Hub’s repos.

– Julien Chaumond, HF CTO

Back in 2020 when we built the first version of the HF Hub, we decided to build it on top of Git LFS because it was decently well-known and it was a reasonable choice to bootstrap the Hub’s usage.

We knew back then, however, that we would want to switch to our own, more optimized storage and versioning backend at some point. Git LFS – even though it stands for Large File storage – was just never meant for the type of large files we handle in AI, which are not just large, but very very large 😃



Example future use cases 🔥 – what this will enable on the Hub

Let’s say you have a 10GB Parquet file. You add a single row. Today you need to re-upload 10GB. With the chunked files and deduplication from XetHub, you will only need to re-upload the few chunks containing the new row.

Another example for GGUF model files: let’s say @bartowski wants to update one single metadata value in the GGUF header for a Llama 3.1 405B repo. Well, in the future bartowski can only re-upload a single chunk of a few kilobytes, making the process way more efficient 🔥

As the field moves to trillion parameters models in the coming months (thanks Maxime Labonne for the new BigLlama-3.1-1T 🤯) our hope is that this new tech will unlock new scale both in the community, and inside of Enterprise companies.

Finally, with large datasets and large models come challenges with collaboration. How do teams work together on large data, models and code? How do users understand how their data and models are evolving? We will be working to find better solutions to answer these questions.



Fun current stats on Hub repos 🤯🤯

  • number of repos: 1.3m models, 450k datasets, 680k spaces
  • total cumulative size: 12PB stored in LFS (280M files) / 7,3 TB stored in git (non-LFS)
  • Hub’s daily number of requests: 1B
  • daily Cloudfront bandwidth: 6PB 🤯



A personal word from @ylow

I have been part of the AI/ML world for over 15 years, and have seen how deep learning has slowly taken over vision, speech, text and really increasingly every data domain.

What I have severely underestimated is the power of data. What seemed like impossible tasks just a few years ago (like image generation) turned out to be possible with orders of magnitude more data, and a model with the capacity to absorb it. In hindsight, this is an ML history lesson that has repeated itself many times.

I have been working in the data domain ever since my PhD. First in a startup (GraphLab/Dato/Turi) where I made structured data and ML algorithms scale on a single machine. Then after it was acquired by Apple, worked to scale AI data management to >100PB, supporting 10s of internal teams who shipped 100s of features annually. In 2021, together with my co-founders, supported by Madrona and other angel investors, started XetHub to bring our learnings of achieving collaboration at scale to the world.

XetHub’s goal is to enable ML teams to operate like software teams, by scaling Git file storage to TBs, seamlessly enabling experimentation and reproducibility, and providing the visualization capabilities to understand how datasets and models evolve.

I, along with the entire XetHub team, are very excited to join Hugging Face and continue this mission to make AI collaboration and development easier – by integrating XetHub technology into Hub – and to release these features to the largest ML Community in the world!



Finally, our Infrastructure team is hiring 👯

If you like those subjects and you want to build and scale the collaboration platform for the open source AI movement, get in touch!

Conclusion

In conclusion, we are excited to announce the acquisition of XetHub and the integration of their technology into Hugging Face. This will enable our community to collaborate more efficiently and scale to new heights.

Frequently Asked Questions

Q1: What is XetHub?

XetHub is a Seattle-based company that has been working on enabling software engineering best practices for AI development.

Q2: What is the mission of XetHub?

The mission of XetHub is to enable Git to scale to TB repositories and enable teams to explore, understand, and work together on large evolving datasets and models.

Q3: How does XetHub’s technology differ from existing solutions?

XetHub’s technology differs from existing solutions in that it is specifically designed for AI development and collaboration, and provides features such as chunked files and deduplication to enable efficient collaboration and experimentation.

Q4: How will XetHub’s technology be integrated into Hugging Face?

XetHub’s technology will be integrated into Hugging Face’s Hub, enabling the community to collaborate more efficiently and scale to new heights.

Q5: What does the acquisition of XetHub mean for Hugging Face?

The acquisition of XetHub means that Hugging Face will gain access to XetHub’s technology and expertise, and will be able to provide its community with new features and capabilities to enable more efficient and scalable AI development and collaboration.

Latest news
Related news