In this article, we explore how Intelligence Processing Units (IPUs) and graphics processing units (GPUs) drive the rapid evolution of generative AI. You’ll learn how generative AI works, how IPU and GPU help in its development, what’s important when choosing AI infrastructure, and you’ll see generative AI projects by Gcore.
Generative AI, or GenAI, is artificial intelligence that can generate content in response to users’ prompts. The content types generated include text, images, audio, video, and code. The goal is for the generated content to be human-like, suitable for practical use, and to correspond with the prompt as much as possible. GenAI is trained by learning patterns and structures from input data and then utilizing that knowledge to generate new and unique outputs.
Here are a few examples of the GenAI tools with which you may be familiar:
- ChatGPT is an AI chatbot that can communicate with humans and write high-quality text and code. It has been taught using vast quantities of data available on the internet.
- DALL-E 2 is an AI image generator that can create images from text descriptions. DALL-E 2 has been trained on a large set of images and text, producing images that look lifelike and attractive.
- Whisper is a speech-to-text AI system that can identify, translate, and transcribe 57 languages (a number that continues to grow.) It has been trained on 680,000 hours of multilingual data. This is a GenAI example in which accuracy is more important than creativity.
GenAI has potential applications in various fields. According to the 2023 McKinsey survey of different industries, marketing and sales, product and service development, and service operations are the most commonly reported uses of GenAl this year.
The table below shows examples of different Generative AI tools: chatbots, text-to-image generators, text-to-video generators, speech-to-text generators, and text-to-code generators. Some of them are already mature whereas others are still in beta testing (as marked on the table) but look promising.
|DALL-E 2 Beta
|Pika Labs Beta
|Imagen Video Beta
|Google Cloud Speech-to-Text
|Conformer Speech Model technology
These GenAI tools require specialized AI infrastructure, such as servers with IPU and GPU modules, to train and function. We will discuss IPUs and GPUs later. First, let’s understand how GenAI works on a higher level.
A GenAI system learns structures and patterns from a given dataset of similar content, such as massive amounts of text, photos, or music; for example, ChatGPT was trained on 570 GB of data from books, websites, research articles, and other forms of content available on the internet. According to ChatGPT itself, this is the equivalent of approximately 389,120 full-length eBooks in ePub format! Using that knowledge, the GenAI system then creates new and unique results. Here is a simplified illustration of this process:
Let’s look at two key phases of how GenAI works: training GenAI on real data and generating new data.
To learn patterns and structures, GenAI systems utilize different types of machine learning and deep learning techniques, most commonly neural networks. A neural network is an algorithm that mimics the human brain to create a system of interconnected nodes that learn to process information by changing the weights of the connections between them. The most popular neural networks are GANs and VAEs.
Generative adversarial networks (GANs) are a popular type of neural network used for GenAI training. Image generators DALL-E 2 and Midjourney were trained using GANs.
GANs operate by setting two neural networks against one another:
- The generator produces new data based on the given real data set.
- The discriminator determines whether the newly generated data is genuine or artificially generated, i.e., fake.
The generator tries to fool the discriminator. The ultimate goal is to generate data that the discriminator can’t distinguish from real data.
Variational autoencoders (VAEs) are another well-known type of neural network used for image, text, music, and other content generation. The image generator Stable Diffusion was trained mostly using VAEs.
VAEs consist of two neural networks:
- The encoder receives training data, such as a photo, and maps it to a latent space. Latent space is a lower dimensional representation of the data that captures the essential features of the input data.
- The decoder analyzes the latent space and generates a new data sample, e.g., a photo imitation.
Here are the basic differences between VAEs and GANs:
- VAEs are probabilistic models, meaning they can generate new data that is more diverse than GANs.
- VAEs are easier to train but don’t generally produce as high-quality images as GANs. GANs can be more difficult to work with but produce better photo-realistic images.
- VAEs work better for signal processing use cases, such as anomaly detection for predictive maintenance or security analytics applications, while GANs are better at generating multimedia.
To get more efficient AI models, developers often train them using combinations of different neural networks.The entire training process can take minutes to months, depending on your goals, dataset, and resources.
Once a generative AI tool has completed its training, it can generate new data; this stage is called inference. A user enters a prompt to generate the content, such as an image, a video, or a text. The GenAI system produces new data according to the user’s prompt.
For the most relevant results, it is ideal to train generative AI systems with a focus on a particular area. As a crude example, if you want a GenAI system to produce high-quality images of kangaroos, it’s best to train the system on images of kangaroos rather than on all existing animals. That’s why gathering relevant data to train AI models is one of the key challenges. This requires the tight collaboration of subject matter experts and data scientists.
There are two primary options when it comes to how you develop a generative AI system. You can utilize a prebuilt AI model and fine-tune it to your needs, or embark on the ambitious journey of training an AI model from the ground up. Regardless of your approach, access to AI infrastructure—IPU and GPU servers—is indispensable. There are two main reasons for this:
- GPU and IPU architectures are adapted for AI workloads
- GPU and IPU are available in the Cloud
Intelligence Processing Units (IPUs) and graphics processing units (GPUs) are specialized hardware designed to accelerate the training and inference of AI models, including models for GenAI training. Their main advantage is that each IPU or GPU module has thousands of cores simultaneously processing data. This makes them ideal for parallel computing, essential in AI training.
As a result, GPUs are usually better deep learning accelerators than, for example, CPUs, which are suitable for sequential tasks but not parallel processing. While the server version of the CPU can have a maximum of 128 cores, a processor in the IPU, for example, has 1472 cores.
Here are the basic differences between GPUs and IPUs:
- GPUs were initially designed for graphics processing, but their efficient parallel computation capabilities also make them well-suited for AI workloads. GPUs are the ideal choice for training and inference ML models. There are several AI-focused GPU hardware vendors on the market, but the clear leader is NVIDIA.
- IPUs are a newer type of hardware designed specifically for AI workloads. They are even more efficient than GPUs at performing parallel computations. IPUs are ideal for training and deploying the most sophisticated AI applications, like large language models (LLMs.) Graphcore is the developer and sole vendor of IPUs, but there are some providers, like Gcore, that offer Graphcore IPUs in the cloud.
Typically, even enterprise-level AI developers don’t buy physical IPU/GPU servers because they are extremely expensive, costing up to $270,000. Instead, developers rent virtual and bare metal IPU/GPU instances from cloud providers on a per-minute or per-hour basis. This is also more convenient because AI training is an iterative process. When you need to run the next training iteration, you rent a server or virtual machine and pay only for the time you actually use it. The same applies to deploying a trained GenAI system for user access: You’ll need the parallel processing capabilities of IPUs/GPUs for better inference speed when generating new data, so you have to either buy or rent this infrastructure.
When choosing AI infrastructure, you should consider which type of AI accelerator better suits your needs in terms of performance and cost.
GPUs are usually an easier way to train models since there are a lot of prebuilt frameworks adapted for GPUs, including PyTorch, TensorFlow, and PaddlePaddle. NVIDIA also offers CUDA for its GPUs; this is a parallel computing software that works perfectly with programming languages widely used in AI development, like C and C++. As a result, GPUs are more suitable if you don’t have deep knowledge of AI training and fine-tuning, and want to get results faster using prebuilt AI models.
IPUs are better than GPUs for complex AI training tasks because they were designed specifically for that task, not for video rendering, for example, as GPUs were originally designed to do. However, due to its newness, IPUs support fewer prebuilt AI frameworks out-of-the-box than GPUs. When you are trying to perform a novel AI training task and therefore don’t have a prebuilt framework, you need to adapt an AI framework or AI model and even write code from scratch to run it. All of this requires technical expertise. However, Graphcore is actively developing SDKs and instructions to ease the use of their hardware.
Graphcore’s IPUs also support packing, a technique that significantly reduces the time required to pre-train, fine-tune, and infer from LLMs. Below is an example of how IPUs excel GPUs in inference for a language learning model based on the BERT architecture when using packing.
Cost-effectiveness is another important consideration when choosing an AI infrastructure. Look for benchmarks that compare AI accelerators in terms of performance per dollar/euro. This can help you to identify efficient choices by finding the right balance between price and compute power, and could save you a lot of money if you plan a long-term project.
Understanding the potential costs of renting AI infrastructure helps you to plan your budget correctly. Research the prices of cloud providers and calculate how much a specific server with a particular configuration will cost you per minute, hour, day, and so on. For more accurate calculations, you need to know the approximate time you’ll need to spend on training. This requires some mathematical effort, especially if you’re developing a GenAI model from scratch. To estimate the training time, you can count the number of operations needed or look at the GPU time.
Gcore’s GenAI projects offer powerful examples of the fine-tuning approach to AI training, using IPU infrastructure.
The project is an example of fine-tuning an existing speech-to-text GenAI model when it doesn’t support a specific language. The base version of Whisper didn’t support Luxembourgish, so our developers had to train the model to help Whisper learn this skill. A GenAI tool with any local or rare language not supported by existing LLMs could be created in the same way.
Al Image Generator is a generative AI tool free for all users registered to the Gcore Platform. It takes your text prompts and creates images of different styles. To develop the Image Generator, we used the prebuilt Openjourney GenAI model. We fine-tuned it using datasets for specific areas, such as gaming, to extend its capabilities and generate a wider range of images. Like our speech-to-text service, the Image Generator is powered by Gcore’s AI IPU infrastructure.
The AI Image Generator is an example of how GenAI models like Openjourney can be customized to generate data with the style and context you need. The main problem with a pretrained model is that it is typically trained on large datasets and may lack accuracy when you need more specific results, like a highly specific stylization. If the prebuilt model doesn’t produce content that matches your expectations, you can collect a more relevant dataset and train your model to get more accurate results, which is what we did at Gcore. This approach can save significant time and resources, as it doesn’t require training the model from scratch.
Here’s what’s in the works for Gcore AI:
- Custom AI model tuning will help to develop AI models for different purposes and projects. A customer can provide their dataset to train a model for their specific goal. For example, you’ll be able to generate graphics and illustrations according to the company’s guidelines, which can reduce the burden on designers.
- AI models marketplace will provide ready-made AI models and frameworks in Gcore Cloud, similar to how our Cloud App Marketplace provides prebuilt cloud applications. Customers will be able to deploy these AI models on Virtual Instances or Bare Metal servers with GPU and IPU modules and either use these models as they are or fine-tune them for specific use cases.
IPUs and GPUs are fundamental to parallel processing, neural network training, and inference. This makes such infrastructure essential for generative AI development. However, GenAI developers need to have a clear understanding of their training goals. This will allow them to utilize the AI infrastructure properly, achieving maximum efficiency and best use of resources.