Shall I Use ADD or COPY in the Dockerfile—What’s the Difference?

By Gcore

3 min read

Shall I Use ADD or COPY in the Dockerfile—What’s the Difference?

Every developer and every team faces confusion about COPY and ADD in the Dockerfile at some point. When I get this question, first I usually give the technical background, which is this:

Both ADD and COPY copy files and directories from the host machine into a Docker image, the difference is that ADD can also extract and copy local tar archives and it can also download files from URLs (a.k.a. the internet), and copy them into the Docker image. The best practice is to use COPY.

So COPY equals ADD minus the unpacking and URL fetching features. COPY is the preferred way, except if you unpack a local tar archive into a Docker image and you are certain that the local archive has the right format.

You can understand why this is the case looking at some background info. Read on…

Why COPY is preferred

The core purpose of ADD and COPY is to let Dockerfile developers copy files and directories from the host machine into the Docker image during image build.

Extracting archives and downloading files from the internet are common use-cases, these features are built into ADD.

The uncompression feature is described in the official documentation as follows:

If <src> is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory.

The following note on the same page further explains the behavior:

Note: Whether a file is identified as a recognized compression format is done solely based on the contents of the file, not the name of the file. For example, if an empty file happens to end with .tar.gz this will not be recognized as a compressed file, and will not generate any kind of decompression error message, rather the file will simply be copied to the destination.

This means that your final outcome depends on the contents of the file you intend to copy, and you don’t get warnings if something goes wrong. This may make your build pipeline unpredictable.

To make life more reliable, we have the COPY instruction, which is “the same as ADD, but without the tar and URL handling”. COPY does one thing and it does it well.

The best practice

Docker best practices suggest to always use COPY when you don’t need extraction functionality, because COPY is more transparent.

In real-life projects COPY is sufficient in most scenarios, mainly because we rarely add tarballs to our applications’ source code. The main use-case for tarballs, thus ADD, is when we create a base image from a tar archive. This doesn’t happen very often. In this case ADD is preferred.

For all other use-cases we use COPY;

We prefer COPY for copying files from the host machine into a Docker image.
We use RUN with curl or wget to fetch files from URLs. ADD does not unpack files from the web anyway, so we are better off avoiding it entirely.

Let’s see how you can accomplish unpacking and URL fetching.

Unpacking local archives

ADD unpacks archives from the host machine, it does not unpack files from URLs. To unpack an archive you just use it in its default form; ADD <src>... <dest>. Check out this sample Dockerfile:

FROM alpine:3.10ADD bigfile.tar.xz /tmp/

When you build the image Docker will unpack the archive.

docker build -t yourname/alpine-bigfile .Sending build context to Docker daemon  4.096kBStep 1/2 : FROM alpine:3.10 ---> 4d90542f0623Step 2/2 : ADD bigfile.tar.xz /tmp/ ---> 32cfa3eb41f7Successfully built 32cfa3eb41f7Successfully tagged yourname/alpine-bigfile:latest

Since the format of ADD is the exact same when you just copy a file or you unpack an archive, this might get tricky. As we mentioned earlier, if Docker does not recognize the archive format during the build, it will copy the archive as it is into the Docker image without warning. You can mitigate the risks by adding a check into your build pipeline.

Our archive in the example was recognized by Docker, so the file is uncompressed in our image:

docker run --rm -ti yourname/alpine-bigfile /bin/ash/ # ls -al /tmptotal 12drwxrwxrwt    1 root     root          4096 Jan 31 09:49 .drwxr-xr-x    1 root     root          4096 Jan 31 09:50 ..-rw-r--r--    1 501      dialout         29 Jan 31 09:46 bigfile

If you need a solution to share your image as an archive, check out our article How to Transfer/Move a Docker Image to Another System?.

Downloading and unpacking archives from a URL

For downloading and unpacking archives from the internet curl or wget are the better options, because it takes only one image layer to get the results you want. With ADD you’d grab the archive first in one layer, then uncompress it with RUN in another. This is not so efficient.

You can build a Dockerfile to curl an archive and uncompress it like shown below.

FROM alpine:3.10RUN apk add --no-cache curl && \  curl -SL https://github.com/yikaus/docker-alpine-base/raw/master/rootfs.tar.xz | tar -xJC /tmp

This takes one image layer and you have full control over the process.

One more thing

One more noteworthy difference between ADD and COPY is that COPY has the --from=<name|index> flag that lets you copy files from a previous build stage in a multi-stage build. ADD does not have this option.

This is another reason to use COPY as your preferred option.

Explore Gcore Container as a Service

Optimize your workload: a guide to selecting the best virtual machine configuration

Virtual machines (VMs) offer the flexibility, scalability, and cost-efficiency that businesses need to optimize workloads. However, choosing the wrong setup can lead to poor performance, wasted resources, and unnecessary costs.In this guide, we’ll walk you through the essential factors to consider when selecting the best virtual machine configuration for your specific workload needs.﹟1 Understand your workload requirementsThe first step in choosing the right virtual machine configuration is understanding the nature of your workload. Workloads can range from light, everyday tasks to resource-intensive applications. When making your decision, consider the following:Compute-intensive workloads: Applications like video rendering, scientific simulations, and data analysis require a higher number of CPU cores. Opt for VMs with multiple processors or CPUs for smoother performance.Memory-intensive workloads: Databases, big data analytics, and high-performance computing (HPC) jobs often need more RAM. Choose a VM configuration that provides sufficient memory to avoid memory bottlenecks.Storage-intensive workloads: If your workload relies heavily on storage, such as file servers or applications requiring frequent read/write operations, prioritize VM configurations that offer high-speed storage options, such as SSDs or NVMe.I/O-intensive workloads: Applications that require frequent network or disk I/O, such as cloud services and distributed applications, benefit from VMs with high-bandwidth and low-latency network interfaces.﹟2 Consider VM size and scalabilityOnce you understand your workload’s requirements, the next step is to choose the right VM size. VM sizes are typically categorized by the amount of CPU, memory, and storage they offer.Start with a baseline: Select a VM configuration that offers a balanced ratio of CPU, RAM, and storage based on your workload type.Scalability: Choose a VM size that allows you to easily scale up or down as your needs change. Many cloud providers offer auto-scaling capabilities that adjust your VM’s resources based on real-time demand, providing flexibility and cost savings.Overprovisioning vs. underprovisioning: Avoid overprovisioning (allocating excessive resources) unless your workload demands peak capacity at all times, as this can lead to unnecessary costs. Similarly, underprovisioning can affect performance, so finding the right balance is essential.﹟3 Evaluate CPU and memory considerationsThe central processing unit (CPU) and memory (RAM) are the heart of a virtual machine. The configuration of both plays a significant role in performance. Workloads that need high processing power, such as video encoding, machine learning, or simulations, will benefit from VMs with multiple CPU cores. However, be mindful of CPU architecture—look for VMs that offer the latest processors (e.g., Intel Xeon, AMD EPYC) for better performance per core.It’s also important that the VM has enough memory to avoid paging, which occurs when the system uses disk space as virtual memory, significantly slowing down performance. Consider a configuration with more RAM and support for faster memory types like DDR4 for memory-heavy applications.﹟4 Assess storage performance and capacityStorage performance and capacity can significantly impact the performance of your virtual machine, especially for applications requiring large data volumes. Key considerations include:Disk type: For faster read/write operations, opt for solid-state drives (SSDs) over traditional hard disk drives (HDDs). Some cloud providers also offer NVMe storage, which can provide even greater speed for highly demanding workloads.Disk size: Choose the right size based on the amount of data you need to store and process. Over-allocating storage space might seem like a safe bet, but it can also increase costs unnecessarily. You can always resize disks later, so avoid over-allocating them upfront.IOPS and throughput: Some workloads require high input/output operations per second (IOPS). If this is a priority for your workload (e.g., databases), make sure that your VM configuration includes high IOPS storage options.﹟5 Weigh up your network requirementsWhen working with cloud-based VMs, network performance is a critical consideration. High-speed and low-latency networking can make a difference for applications such as online gaming, video conferencing, and real-time analytics.Bandwidth: Check whether the VM configuration offers the necessary bandwidth for your workload. For applications that handle large data transfers, such as cloud backup or file servers, make sure that the network interface provides high throughput.Network latency: Low latency is crucial for applications where real-time performance is key (e.g., trading systems, gaming). Choose VMs with low-latency networking options to minimize delays and improve the user experience.Network isolation and security: Check if your VM configuration provides the necessary network isolation and security features, especially when handling sensitive data or operating in multi-tenant environments.﹟6 Factor in cost considerationsWhile it’s essential that your VM has the right configuration, cost is always an important factor to consider. Cloud providers typically charge based on the resources allocated, so optimizing for cost efficiency can significantly impact your budget.Consider whether a pay-as-you-go or reserved model (which offers discounted rates in exchange for a long-term commitment) fits your usage pattern. The reserved option can provide significant savings if your workload runs continuously. You can also use monitoring tools to track your VM’s performance and resource usage over time. This data will help you make informed decisions about scaling up or down so you’re not paying for unused resources.﹟7 Evaluate security featuresSecurity is a primary concern when selecting a VM configuration, especially for workloads handling sensitive data. Consider the following:Built-in security: Look for VMs that offer integrated security features such as DDoS protection, web application firewall (WAF), and encryption.Compliance: Check that the VM configuration meets industry standards and regulations, such as GDPR, ISO 27001, and PCI DSS.Network security: Evaluate the VM's network isolation capabilities and the availability of cloud firewalls to manage incoming and outgoing traffic.﹟8 Consider geographic locationThe geographic location of your VM can impact latency and compliance. Therefore, it’s a good idea to choose VM locations that are geographically close to your end users to minimize latency and improve performance. In addition, it’s essential to select VM locations that comply with local data sovereignty laws and regulations.﹟9 Assess backup and recovery optionsBackup and recovery are critical for maintaining data integrity and availability. Look for VMs that offer automated backup solutions so that data is regularly saved. You should also evaluate disaster recovery capabilities, including the ability to quickly restore data and applications in case of failure.﹟10 Test and iterateFinally, once you've chosen a VM configuration, testing its performance under real-world conditions is essential. Most cloud providers offer performance monitoring tools that allow you to assess how well your VM is meeting your workload requirements.If you notice any performance bottlenecks, be prepared to adjust the configuration. This could involve increasing CPU cores, adding more memory, or upgrading storage. Regular testing and fine-tuning means that your VM is always optimized.Choosing a virtual machine that suits your requirementsSelecting the best virtual machine configuration is a key step toward optimizing your workloads efficiently, cost-effectively, and without unnecessary performance bottlenecks. By understanding your workload’s needs, considering factors like CPU, memory, storage, and network performance, and continuously monitoring resource usage, you can make informed decisions that lead to better outcomes and savings.Whether you're running a small application or large-scale enterprise software, the right VM configuration can significantly improve performance and cost. Gcore offers a wide range of virtual machine options that can meet your unique requirements. Our virtual machines are designed to meet diverse workload requirements, providing dedicated vCPUs, high-speed storage, and low-latency networking across 30+ global regions. You can scale compute resources on demand, benefit from free egress traffic, and enjoy flexible pricing models by paying only for the resources in use, maximizing the value of your cloud investments.Contact us to discuss your VM needs

Shall I Use ADD or COPY in the Dockerfile—What’s the Difference?

Why COPY is preferred

The best practice

Unpacking local archives

Downloading and unpacking archives from a URL

One more thing

Related articles

Pre-configure your dev environment with Gcore VM init scripts

How to cut egress costs and speed up delivery using Gcore CDN and Object Storage

Bare metal vs. virtual machines: performance, cost, and use case comparison

Optimize your workload: a guide to selecting the best virtual machine configuration

How to get the size of a directory in Linux

How to Run Hugging Face Spaces on Gcore Inference at the Edge

Subscribe to our newsletter