IBM Research Helps Extend PyTorch To Enable Open-source Cloud-native Machine Learning

Watch the Low-Code/No-Code Summit on-demand sessions to learn how to successfully innovate and achieve efficiencies by upskilling and scaling citizen developers. Watch now.

Foundation models have the potential to change the way organizations build and train artificial intelligence (AI) with machine learning (ML).

A major challenge for building foundation models is that until now they have generally required the use of specific types of network and infrastructure hardware to operate efficiently. There is also limited support for developers who want to build a base model with a fully open source stack. It’s a challenge that IBM research tries to help solve in different ways.

>>Don’t miss our new special issue: Zero trust: the new security paradigm.<

“Our question was: can we train foundation models, but in such a way that we do it on basic hardware? And make it more accessible instead of just being in the hands of a select few researchers,” Raghu Ganti, principal research associate at IBM, told VentureBeat.

Event

Intelligent security stop

On December 8, learn about the critical role of AI and ML in cybersecurity and industry-specific case studies. Register for your free pass today.

To that end, IBM announced today that it has developed and contributed code to the open-source PyTorch machine learning project to make the technology work more efficiently with standard Ethernet-based networks. IBM has also built an open source operator that helps optimize PyTorch deployment on the Red Hat OpenShift platform, which is based on the open source Kubernetes cloud container orchestration project.

To infinity and beyond: how IBM helped expand PyTorch

To date, many base models have been trained on hardware that supports the InfiniBand networking stack typically found only on high-performance computing (HPC) hardware.

While GPUs are the foundation of AI, there is a need for powerful networking technology to connect multiple GPUs together. Ganti explained that it is possible to train large models without InfiniBand networks, but it is inefficient in a number of ways.

For example, he said that with the standard PyTorch technology, training a model with 11 billion parameters over an Ethernet-based network can be done with only 20% GPU efficiency. Improving that efficiency is what IBM did alongside the PyTorch community.

“This is a very complex problem and there are a lot of knobs to tune,” said Ganti.

The knobs that need tweaking are all about ensuring optimized GPU and network usage. Ganti said the goal is to keep both the network and GPU busy at the same time to speed up the overall training process.

The code to optimize PyTorch to work better over Ethernet was merged into the PyTorch 1.13 update that became generally available on October 28.

“We were able to go from 20% GPU usage all the way to 90%, and that’s a 4.5x improvement in terms of training speeds,” said Ganti.

Shifting PyTorch into high gear for faster training

In addition to the code improvements in PyTorch, IBM has also been working on the open-source Red Hat OpenShift Kubernetes platform to support base model development.

Ganti said part of what they’ve been doing is making sure that the maximum bandwidth the Ethernet network can provide is reflected at the pod level in OpenShift.

Using Kubernetes to train foundation models is not a new idea. Open AIthe organization behind some of the most widely used models, including GPT-3 and DALL-E publicly discussed how it uses Kubernetes. What is new, according to IBM, is that the technology for this is available as open source. IBM has open sourced a Kubernetes operator that provides the necessary configuration to help organizations scale a cluster to support large model training.

With the PyTorch Foundation, more open-source innovation is now possible

Until September, PyTorch was operated as an open-source project managed by Meta. That changed on September 12, when the PyTorch Foundation was announced as a new organizing body led by the Linux Foundation.

Ganti said IBM’s effort to contribute code to PyTorch actually started before the announcement of the new PyTorch Foundation. He explained that under Meta’s administration, IBM could not actually directly commit code to the project. Instead, the code had to be committed by Meta staffers who had commit access.

Ganti expects PyTorch to become more collaborative and open under the leadership of the Linux Foundation. “I think so [PyTorch Foundation] will improve open-source collaboration,” said Ganti.

VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.

Is Sal Vulcano Married? Explore relationship and dating history

Jonathan Hillstrand Net Worth, Wife, Daughter, Weight Loss

Who is Ariel Contreras from ‘Hell’s Kitchen’ anyway? Biography

Who is Kaydon Boebert? Bio, parents, siblings, age, relationship

MSC’s Explora Journeys Makes Its Maiden Voyage from Copenhagen to Reykjavik

IBM Research helps extend PyTorch to enable open-source cloud-native machine learning

Event

To infinity and beyond: how IBM helped expand PyTorch

Shifting PyTorch into high gear for faster training

With the PyTorch Foundation, more open-source innovation is now possible

Latest news

Казино Онлайн 1xbet Играть Онлайн и Казино ᐉ 1xbet Co

Mosbet: Onlayn Kazino Və Idman Mərcləri

Лучшие Онлайн Казино 2024 Топ Казино Для Игры в Деньг

Azərbaycanda Mərc Oyunları Şirkəti Görüş Və Rəylər

Vulkan Vegas Promo Code März 2024: Bis Zu A Thousand Bonus

1win ⭐ Ei̇dman Və Kazino Mərcləri >> Depozit Bonusu $1000

Must read

You might also likeRELATED
Recommended to you

POPULAR POSTS

Why Managed Discovery and Response (MDR) adoption is growing among small...

What Uber’s data breach reveals about social engineering

Growfin’s AI-based cash collection SaaS continues to expand into the US...

POPULAR CATEGORY

IBM Research helps extend PyTorch to enable open-source cloud-native machine learning

Event

To infinity and beyond: how IBM helped expand PyTorch

Shifting PyTorch into high gear for faster training

With the PyTorch Foundation, more open-source innovation is now possible

Latest news

Must read

You might also likeRELATEDRecommended to you

POPULAR POSTS

POPULAR CATEGORY

You might also likeRELATED
Recommended to you