Bare metal GPUs vs. Virtual GPUs

3 underestimated security risks of AI workloads and how to overcome them

Artificial intelligence workloads introduce a fundamentally different security landscape for engineering and security teams. Unlike traditional applications, AI systems must protect not just endpoints and networks, but also training data pipelines, feature stores, model repositories, and inference APIs. Each phase of the AI life cycle presents distinct attack vectors that adversaries can exploit to corrupt model behavior, extract proprietary logic, or manipulate downstream outputs.In this article, we uncover three security vulnerabilities of AI workloads and explain how developers and MLOps teams can overcome them. We also look at how investing in your AI security can save time and money, explore the challenges that lie ahead for AI security, and offer a simplified way to protect your AI workloads with Gcore.Risk #1: data poisoningData poisoning is a targeted attack on the integrity of AI systems, where malicious actors subtly inject corrupted or manipulated data into training pipelines. The result is a model that behaves unpredictably, generates biased or false outputs, or embeds hidden logic that can be triggered post-deployment. This can undermine business-critical applications—from fraud detection and medical diagnostics to content moderation and autonomous decision-making.For developers, the stakes are high: poisoned models are hard to detect once deployed, and even small perturbations in training data can have system-wide consequences. Luckily, you can take a few steps to mitigate against data poisoning and then implement zero-trust AI to further protect your workloads.Mitigation and hardeningRestrict dataset access using IAM, RBAC, or identity-aware proxies.Store all datasets in versioned, signed, and hashed formats.Validate datasets with automated schema checks, label distribution scans, and statistical outlier detection before training.Track data provenance with metadata logs and checksums.Block training runs if datasets fail predefined data quality gates.Integrate data validation scripts into CI/CD pipelines pre-training.Enforce zero-trust access policies for data ingestion services.Solution integration: zero-trust AIImplement continuous authentication and authorization for each component interacting with data (e.g., preprocessing scripts, training jobs).Enable real-time threat detection during training using runtime security tools.Automate incident response triggers for unexpected file access or data source changes.Risk #2: adversarial attacksAdversarial attacks manipulate model inputs in subtle ways that trick AI systems into making incorrect or dangerous decisions. These perturbations—often imperceptible to humans—can cause models to misclassify images, misinterpret speech, or misread sensor data. In high-stakes environments like facial recognition, autonomous vehicles, or fraud detection, these failures can result in security breaches, legal liabilities, or physical harm.For developers, the threat is real: even state-of-the-art models can be easily fooled without adversarial hardening. The good news? You can make your models more robust by combining defensive training techniques, input sanitization, and secure API practices. While encrypted inference doesn’t directly block adversarial manipulation, it ensures that sensitive inference data stays protected even if attackers attempt to probe the system.Mitigation and hardeningUse adversarial training frameworks like CleverHans or IBM ART to expose models to perturbed inputs during training.Apply input sanitization layers (e.g., JPEG re-encoding, blurring, or noise filters) before data reaches the model.Implement rate limiting and authentication on inference APIs to block automated adversarial probing.Use model ensembles or randomized smoothing to improve resilience to small input perturbations.Log and analyze input-output patterns to detect high-variance or abnormal responses.Test models regularly against known attack vectors using robustness evaluation tools.Solution integration: encrypted inferenceWhile encryption doesn't prevent adversarial inputs, it does mean that input data and model responses remain confidential and protected from observation or tampering during inference.Run inference in trusted environments like Intel SGX or AWS Nitro Enclaves to protect model and data integrity.Use homomorphic encryption or SMPC to process encrypted data without exposing sensitive input.Ensure that all intermediate and output data is encrypted at rest and in transit.Deploy access policies that restrict inference to verified users and approved applications.Risk #3: model leakage of intellectual assetsModel leakage—or model extraction—happens when an attacker interacts with a deployed model in ways that allow them to reverse-engineer its structure, logic, or parameters. Once leaked, a model can be cloned, monetized, or used to bypass the very defenses it was meant to enforce. For businesses, this means losing competitive IP, compromising user privacy, or enabling downstream attacks.For developers and MLOps teams, the challenge is securing deployed models in a way that balances performance and privacy. If you're exposing inference APIs, you’re exposing potential entry points—but with the right controls and architecture, you can drastically reduce the risk of model theft.Mitigation and hardeningEnforce rate limits and usage quotas on all inference endpoints.Monitor for suspicious or repeated queries that indicate model extraction attempts.Implement model watermarking or fingerprinting to trace unauthorized model use.Obfuscate models before deployment using quantization, pruning, or graph rewriting.Disable or tightly control any model export functionality in your platform.Sign and verify inference requests and responses to ensure authenticity.Integrate security checks into CI/CD pipelines to detect risky configurations—such as public model endpoints, export-enabled containers, or missing inference authentication—before they reach production.Solution integration: native security integrationIntegrate model validation, packaging, and signing into CI/CD pipelines.Serve models from encrypted containers or TEEs, with minimal runtime exposure.Use container and image scanning tools to catch misconfigurations before deployment.Centralize monitoring and protection with tools like Gcore WAAP for real-time anomaly detection and automated response.How investing in AI security can save your business moneyFrom a financial point of view, the use of AI and machine learning in cybersecurity can lead to massive cost savings. Organizations that utilize AI and automation in cybersecurity have saved an average of $2.22 million per data breach compared to organizations that do not have these protections in place. This is because the necessity for manual oversight is reduced, lowering the total cost of ownership, and averting costly security breaches. The initial investment in advanced security technologies yields returns through decreased downtime, fewer false positives, and an enhanced overall security posture.Challenges aheadWhile securing the AI lifecycle is essential, it’s still difficult to balance robust security with a positive user experience. Rigid scrutiny can add additional latency or false positives that can stop operations, but AI-powered security can avoid such incidents.Another concern organizations must contend with is how to maintain current AI models. With threats changing so rapidly, today's newest model could easily become outdated by tomorrow’s. Solutions must have an ongoing learning ability so that security detection parameters can be revised.Operational maturity is also a concern, especially for companies that operate in multiple geographies. Well-thought-out strategies and sound governance processes must accompany the integration of complex AI/ML tools with existing infrastructure, but automation still offers the most benefits by reducing the overhead on security teams and helping ensure consistent deployment of security policies.Get ahead of AI security with GcoreAI workloads introduce new and often overlooked security risks that can compromise data integrity, model behavior, and intellectual property. By implementing practices like zero-trust architecture, encrypted inference, and native security integration, developers can build more resilient and trustworthy AI systems. As threats evolve, staying ahead means embedding security at every phase of the AI lifecycle.Gcore helps teams apply these principles at scale, offering native support for zero-trust AI, encrypted inference, and intelligent API protection. As an experienced AI and security solutions provider, our DDoS Protection and AI-enabled WAAP solutions integrate natively with Everywhere Inference and GPU Cloud across 210+ global points of presence. That means low latency, high performance, and proven, robust security, no matter where your customers are located.Talk with our AI security experts and secure your workloads today

What are virtual machines?

A virtual machine (VM), also called a virtual instance, is a software-based version of a physical computer. Instead of running directly on hardware, a VM operates inside a program that emulates a complete computer system, including a processor, memory, storage, and network connections. This allows multiple VMs to run on a single physical machine, each with its own operating system and applications, as if they were independent computers.VMS are useful because they provide flexibility, isolation, and scalability. Since each VM is self-contained, it can run different operating systems (like Windows, Linux, or macOS) on the same hardware without affecting other VMs or the host machine. This makes them ideal for testing software, running legacy applications, or efficiently using server resources in data centers. Because VMs exist as software, they can be easily copied, moved, or backed up, making them a powerful tool for both individuals and businesses.Read on to learn about types of VMs, their benefits, common use cases, and how to choose the right VM provider for your needs.How do VMs work?A virtual machine (VM) runs inside a program called a hypervisor, which acts as an intermediary between the VM and the actual computer hardware. Every time a VM needs to perform an action—such as running software, accessing storage, or using the processor—the hypervisor intercepts these requests and decides how to allocate resources like CPU power, memory, and disk space. You can think of a hypervisor as an operating system for VMs, managing multiple virtual machines on a single physical computer. Popular hypervisors like VirtualBox and VMware enable users to run multiple operating systems simultaneously while providing strong isolation.Modern hypervisors optimize performance by giving VMs direct access to certain hardware components when possible, reducing the need for constant intervention. However, some level of overhead remains because the hypervisor still needs to manage and coordinate resources efficiently. This means that while VMs can leverage most of the system’s hardware, they can’t use 100% of it, as some processing power is always reserved for managing virtualization itself. This small trade-off is often worth it, as hypervisors keep each VM isolated and secure, preventing one VM from interfering with another.VM layersFigure 1 illustrates the layers of a system virtual machine setup. The layer model can vary depending on the hypervisor. Some hypervisors include a built-in host operating system, while modern hardware offers native virtualization support. Many hypervisors can also manage multiple physical machines and VMs efficiently.VM snapshots are an essential feature in cloud computing, allowing users to quickly restore a virtual machine to a previous state.Figure 1: Layers of system virtual machinesHypervisors that emulate hardware architectures different from what the guest OS expects have a bigger overhead, as they can’t relay commands directly to the hardware without first translating them.VM snapshotsVM snapshots are an essential feature in cloud computing, allowing users to quickly restore a virtual machine to a previous state. The hypervisor can save the complete state of the VM and restore it at a later time to skip the boot process of the guest OS. The hypervisor can also move these snapshots between different physical machines, making the software running in the VM completely independent from the underlying hardware.What are the benefits of using VMs?Virtual machines offer benefits including resource efficiency, isolation, simplified operations, easy migration, faster deployment, cost savings, and security. Let’s look at these one by one.Multiple VMs can run on a single physical machine, making sharing resources between various guest operating systems easier. This is especially important when each guest OS needs to be isolated from the others, such as when they belong to different customers of a cloud service provider. Sharing resources through VMs makes running a server cheaper because you don’t have to buy or rent a whole physical machine, but only parts of it.Since VMs abstract the underlying hardware, they also improve resilience. If the physical machine fails, the hypervisor can perform a quick recovery by moving the snapshots to another machine without changing the guest OS installations to minimize downtime. This abstraction also allows operations teams to focus their deployment efforts on a standardized VM instead of considering different physical implementations.Migrations become easier with snapshots as you can simply move them to a faster machine without modifying the software running inside the VM.Faster deployments are possible because starting a VM is just a software execution instead of setting up a physical server in a data center. While you had to buy a server or rent it for months, with fast deployments, you can now rent a machine for hours, minutes, or even seconds, which allows for quite some savings.Modern CPUs have built-in virtualization features that enable easy resource sharing and enforce the isolation at the hardware layer. This prevents the services of one VM from accessing the resources of the others, improving security compared to running multiple apps inside one OS.Common use cases for VMsVMs have a range of use cases. Let’s look at the most popular ones.Cloud computingThe most popular use case is cloud computing, where VMs allow the secure sharing of the cloud provider’s resources, enabling their customers to rent only the resources they need for the period their workload will run.Software development and testingSoftware development often requires specific tools and libraries that aren’t available on a production machine, so having a development VM with all these tools preinstalled can be helpful. An example is cloud IDEs, which look and feel like regular IDEs but run on a cloud VM. A developer can have one for each project with the required dev tools installed.VMs also allow a developer to set up a machine for software testing that looks exactly like the production environment. Here, the opposite of the development VM is required; it should not have any development tools installed because they would also be missing from production.Cross-platform developmentA special case of the software development use case is cross-platform development. When you implement an app for Android or iOS, for example, you usually don’t do this on a mobile device but on your computer. With VMs, developers can simulate different hardware environments, enabling cross-platform testing without requiring physical devices.Legacy system supportIf the hardware your application requires is no longer in production, a VM might be the only way to keep running your software without reimplementing it. This is similar to the cross-platform development use case, as the VM emulates different hardware, but the difference is that the hardware no longer exists.How to choose the right VM providerTo find the right provider for your workload, the most important factor to assess is your own workload requirements. Ask the following questions and compare the answers to what providers offer.Is your workload compute or I/O-bound?Many workloads, like web servers, are I/O-bound. They don’t make complex calculations but rather simply load data and send it over the network. If you need a VM for an I/O-bound workload, you care more about disk and memory size, as well as network speed.However, compute-heavy workloads, such as AI inference or Kubernetes clusters, require careful resource allocation. If you’re evaluating whether to run Kubernetes on bare metal or VMs, check out our white paper on Bare Metal vs. VM-based Kubernetes Clusters for an in-depth comparison.If your workload is compute-bound instead, you need a high-performance CPU or a GPU and loads of memory. An AI inference engine, for example, only sends a bit of text to a client, but it does many calculations to generate this text.How long will your workload run?Web servers usually run indefinitely, but some workloads only run a few hours or minutes. If you’re doing AI training, you don’t want to pay for your huge VM cluster 24/7 if it only runs a few hours or days a week. In such cases, looking for a provider that allows renting your desired VM type hourly on a pay-as-you-go model might be worthwhile.Certain cloud providers offer cost-effective spot instances, which provide lower prices for non-critical workloads that can tolerate interruptions. These cheap VMs can get shut down at any time with minimal notice, but if your calculations aren’t time-critical, you might save quite a bit of money here.How does your workload scale?Scaling in the cloud is usually done horizontally. That is, by adding more VMs and distributing the work between them. Workloads can have different requirements for when and how fast they must be added and removed.In the AI training example, you might know in advance that one training takes more resources than the other, so you can provision enough VMs when starting. However, a web server workload might change its requirements constantly. Hence, you need a load balancer that automatically scales the instances up and down depending on the number of clients that want to access your service.Do you handle sensitive data?You might have to comply with specific laws and regulations depending on your jurisdiction(s) and industry. This means you must check whether the cloud provider also complies. How secure are their data centers? Where are they located? Do they support encryption in transit, at rest, and in process?What are your reliability requirements?Reliability is a question of costs and, again, of compliance. You might get into financial or regulatory troubles if your workload can’t run. Cloud providers often boast about their guaranteed uptimes, but remember that 99% uptime a year still means over three days of potential downtime. Check your needs and then seek a provider that can meet them reliably.Do you need customer support?If your organization doesn’t have the know-how for operating VMs in the cloud, you might need technical support from the provider. Most cloud providers are self-service, offering you a GUI and an API to manage resources. If your business lacks the resources to operate VMs, seek out a provider that can manage VMs on your behalf.SummaryVMs are a core technology for cloud computing and software development alike. They enable efficient resource sharing, improve security with hardware-enforced guest isolation, and simplify migration and disaster recovery. Choosing the right VM provider starts with understanding your workload requirements, from resource allocation to security and scalability.Maximize cloud efficiency with Gcore Virtual Machines—engineered for high performance, seamless scalability, and enterprise-grade security at competitive pricing. Whether you need to run workloads at scale or deploy applications in seconds, our VMs provide enterprise-grade security, built-in resilience, and optimized resource allocation, all powered by cutting-edge infrastructure. With global reach, fast provisioning, egress traffic included, and pay-as-you-go pricing, you get the scalability and reliability your business needs without overspending. Start your journey with Gcore VMs today and experience cloud computing that’s built for speed, security, and savings.Discover Gcore VMs

	Bare metal GPU	Virtual GPU
	Bare metal GPU	Passthrough GPU	vGPU
Benefits	Dedicated GPU resources High performance for demanding AI workloads	Lower cost Simple scalability Suitable for occasional or variable workloads	Lowest cost Simple scalability Suitable for occasional or variable workloads
Drawbacks	High cost compared to virtual GPUs Less flexible and scalable than virtual GPUs	Low performance Not suitable for demanding AI workloads	Lowest performance Not suitable for demanding AI workloads

How to Choose Between Bare Metal GPUs and Virtual GPUs for AI Workloads

The Difference Between Bare Metal and Virtual GPUs

Should You Use Bare Metal or Virtual GPUs?

Choose Gcore for Best-in-Class AI GPUs

Related articles

3 underestimated security risks of AI workloads and how to overcome them

Securing AI from the ground up: defense across the lifecycle

How AI is reshaping the future of interactive streaming

What are virtual machines?

How to deploy DeepSeek 70B with Ollama and a Web UI on Gcore Everywhere Inference

What is AI inference and how does it work?

Subscribe to our newsletter