Radar has landed - discover the latest DDoS attack trends. Get ahead, stay protected.Get the report
Under attack?

Products

Solutions

Resources

Partners

Why Gcore

  1. Home
  2. Developers
  3. Reverse Engineer Docker Images into Dockerfiles

Reverse Engineer Docker Images into Dockerfiles

  • By Gcore
  • April 8, 2023
  • 11 min read
Reverse Engineer Docker Images into Dockerfiles

This article explores how we can reverse engineer Docker images by examining the internals of how Docker images store data, how to use tools to examine the different aspects of the image, and how we can create tools like Dedockify to leverage the Python Docker API to create Dockerfiles from source images.

Introduction

As public Docker registries like Docker Hub and TreeScale increase in popularity, except for the most restrictive environments, it has become common for admins and developers to casually download an image built by an unknown entity. It often comes down to the convivence outweighing the perceived risk. When a Docker image is made publicly available, the Dockerfile is sometimes also provided, either directly in the listing, in a git repository, or through an associated link, but sometimes this is not the case. Even if the Dockerfile was made available, we don’t have many assurances that the published image is safe to use.

Maybe security vulnerabilities aren’t your concern. Perhaps one of your favorite images is no longer being maintained, and you would like to update it so that it runs on the latest version of Ubuntu. Or perhaps a compiler for another distribution has an exclusive feature that makes it better optimized to produce binaries during compile time, and you have an uncontrollable compulsion to release a similar image that’s just a little more optimized.

Whatever the reason, if you wish to recover a Dockerfile from an image, there are options. Docker images aren’t a black box. Often, you can retrieve most of the information you need to reconstruct a Dockerfile. In this article, we will explore exactly how to do that by looking inside a Docker image so that we can very closely reconstruct the Dockerfile that built it.

In this article, we will show how it’s possible to reconstruct a Dockerfile from an image using two tools, Dedockify, a customized Python script provided for this article, and dive. The basic process flow used will be as follows.

Using dive

To get some quick, minimal-effort intuition regarding how images are composed, we will introduce ourselves to various advanced and potentially unfamiliar Docker concepts using Dive. Dive is an image exploration tool that allows examination of each layer of a Docker image.

First, let us create a simple, easy to follow Dockerfile that we can explore for testing purposes.

In an empty directory, enter the following snippet directly into the command line:

cat > Dockerfile << EOF ; touch testfile1 testfile2 testfile3FROM scratchCOPY testfile1 /COPY testfile2 /COPY testfile3 /EOF

By entering the above and pressing enter, we’ve just created a new Dockerfile and populated three zero-byte test files in the same directory.

$ lsDockerfile  testfile1  testfile2  testfile3

So now, lets build an image using this Dockerfile and tag it as example1.

docker build . -t example1

Building the example1 image should produce the following output:

Sending build context to Docker daemon  3.584kBStep 1/4 : FROM scratch --->Step 2/4 : COPY testfile1 / ---> a9cc49948e40Step 3/4 : COPY testfile2 / ---> 84acff3a5554Step 4/4 : COPY testfile3 / ---> 374e0127c1bcSuccessfully built 374e0127c1bcSuccessfully tagged example1:latest

The following zero-byte example1 image should now be available:

$ docker imagesREPOSITORY          TAG                 IMAGE ID            CREATED             SIZEexample1            latest              374e0127c1bc        31 seconds ago      0B

Note that since there’s no binary data, this image won’t be functional. We are only using it as a simplified example of how layers can be viewed in Docker images.

We can see here by the size of the image that there is no source image. Instead of a source image, we used scratch which instructed Docker to use a zero-byte blank image as the source image. We then modified the blank image by copying three additional zero-byte test files onto it, and then tagged the changes as example1.

Now, let us explore our new image with Dive.

docker run --rm -it \    -v /var/run/docker.sock:/var/run/docker.sock \    wagoodman/dive:latest example1

Executing the above command should automatically pull wagoodman/dive from Docker Hub, and produce the output of Dive’s polished interface.

Unable to find image 'wagoodman/dive:latest' locallylatest: Pulling from wagoodman/dive89d9c30c1d48: Pull complete5ac8ae86f99b: Pull completef10575f61141: Pull completeDigest: sha256:2d3be9e9362ecdcb04bf3afdd402a785b877e3bcca3d2fc6e10a83d99ce0955fStatus: Downloaded newer image for wagoodman/dive:latestImage Source: docker://example-imageFetching image... (this can take a while for large images)Analyzing image...Building cache...

Scroll through the three layers of the image in the list to find the three files in the tree displayed on the right.

We can see the contents on the right change as we scroll through each layer. As each file was copied to a blank Docker scratch image, it was recorded as a new layer.

Notice also that we can see the commands that were used to produced each layer. We can also see the hash value of the source file and the file that was updated.

If we take note of the items in the Command: section, we should see the following:

#(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /#(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /#(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /

Each command provides solid insight into the original command used in the Dockerfile to produce the image. However, the original filename is lost. It appears that the only way to recover this information is to make observations about the changes to the target filesystem, or perhaps to infer based on other details. More on this later.

docker history

Aside from third-party tools like dive, the tool we have immediately available is docker history. If we use the docker history command on our example1 image, we can view the entries we used in the Dockerfile to create that image.

docker history example1

We should, therefore, get the following result:

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT374e0127c1bc        25 minutes ago      /bin/sh -c #(nop) COPY file:aa717ff85b39d3ed…   0B84acff3a5554        25 minutes ago      /bin/sh -c #(nop) COPY file:2a949ad55eee33f6…   0Ba9cc49948e40        25 minutes ago      /bin/sh -c #(nop) COPY file:e3c862873fa89cbf…   0B

Notice that everything in the CREATED BY column is truncated. These are Dockerfile directives passed through Bourne shell. This information could be useful for recreating our Dockerfile, and although it is truncated here, we can view all of it by also using the --no-trunc option:

$ docker history example1 --no-truncIMAGE                                                                     CREATED             CREATED BY                                                                                           SIZE                COMMENTsha256:374e0127c1bc51bca9330c01a9956be163850162f3c9f3be0340bb142bc57d81   29 minutes ago      /bin/sh -c #(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /    0Bsha256:84acff3a5554aea9a3a98549286347dd466d46db6aa7c2e13bb77f0012490cef   29 minutes ago      /bin/sh -c #(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /    0Bsha256:a9cc49948e40d15166b06dab42ea0e388f9905dfdddee7092f9f291d481467fc   29 minutes ago      /bin/sh -c #(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /    0B

While this has some useful data, it could be a challenge to parse from the command line. We could also use docker inspect. However, in this article, we will focus on using the Docker Engine API for Python.

Using Docker Engine API for Python

Docker released a Python library for the Docker Engine API, which allows full control of Docker from within Python. In the following example, we can recover similar information we did using docker history by running the following Python 3 code:

#!/usr/bin/python3import dockercli = docker.APIClient(base_url='unix://var/run/docker.sock')print (cli.history('example1'))

This should result in output much like the following:

[{'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in / ', 'Id': 'sha256:374e0127c1bc51bca9330c01a9956be163850162f3c9f3be0340bb142bc57d81', 'Size': 0, 'Tags': ['example:latest']}, {'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in / ', 'Id': 'sha256:84acff3a5554aea9a3a98549286347dd466d46db6aa7c2e13bb77f0012490cef', 'Size': 0, 'Tags': None}, {'Comment': '', 'Created': 1583008507, 'CreatedBy': '/bin/sh -c #(nop) COPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in / ', 'Id': 'sha256:a9cc49948e40d15166b06dab42ea0e388f9905dfdddee7092f9f291d481467fc', 'Size': 0, 'Tags': None}]

Looking at the output, we can see that reconstructing much of the Dockerfile is just a matter of parsing all the relevant data and reversing the entries. But as we saw earlier, we also notice that there are a few hashed entries in the COPY directives. As previously mentioned, the hashed entries here represent filenames used from outside the layer. This information isn’t directly recoverable. However, just as we saw in dive, we can infer these names when we search for changes made to the layer. It’s also sometimes possible to infer in cases where the original copy directive included the target filename as the destination. In other cases, the filenames may not be critical, allowing us to use arbitrary filenames. And still in other cases, while more difficult to assess, we can infer filenames that are back-referenced elsewhere in the system, such as in supporting dependencies like scripts or configuration files. But in any case, searching for all changes between layers is the most reliable.

Dedockify

Let’s take this a few steps further. In order to help reverse engineer this image into a Dockerfile, we will need to parse everything and reformat it into a form that is readable. Please note that for the purposes of this article, the following Python 3 code has been made available and can be obtained from the Dedockify repository on GitHub. Thanks goes to LanikSJ for all prior work.

from sys import argvimport dockerclass ImageNotFound(Exception):    passclass MainObj:    def __init__(self):        super(MainObj, self).__init__()        self.commands = []        self.cli = docker.APIClient(base_url='unix://var/run/docker.sock')        self._get_image(argv[-1])        self.hist = self.cli.history(self.img['RepoTags'][0])        self._parse_history()        self.commands.reverse()        self._print_commands()    def _print_commands(self):        for i in self.commands:            print(i)    def _get_image(self, img_hash):        images = self.cli.images()        for i in images:            if img_hash in i['Id']:                self.img = i                return        raise ImageNotFound("Image {} not found\n".format(img_hash))    def _insert_step(self, step):        if "#(nop)" in step:            to_add = step.split("#(nop) ")[1]        else:            to_add = ("RUN {}".format(step))        to_add = to_add.replace("&&", "\\\n    &&")        self.commands.append(to_add.strip(' '))    def _parse_history(self, rec=False):        first_tag = False        actual_tag = False        for i in self.hist:            if i['Tags']:                actual_tag = i['Tags'][0]                if first_tag and not rec:                    break                first_tag = True            self._insert_step(i['CreatedBy'])        if not rec:            self.commands.append("FROM {}".format(actual_tag))__main__ = MainObj()

Initial Dockerfile Generation

If you’ve made it this far, then you should have two images: wagoodman/dive and our custom example1 image.

$ docker imagesREPOSITORY          TAG                 IMAGE ID            CREATED             SIZEexample1            latest              374e0127c1bc        42 minutes ago      0Bwagoodman/dive      latest              4d9ce0be7689        2 weeks ago         83.6MB

Running this code against our example1 image will finally produce the following:

$ python3 dedockify.py 374e0127c1bcFROM example1:latestCOPY file:e3c862873fa89cbf2870e2afb7f411d5367d37a4aea01f2620f7314d3370edcc in /COPY file:2a949ad55eee33f6191c82c4554fe83e069d84e9d9d8802f5584c34e79e5622c in /COPY file:aa717ff85b39d3ed034eed42bc1186230cfca081010d9dde956468decdf8bf20 in /

We’ve extracted nearly the same information that we observed when we explored the image with dive earlier. Notice the FROM directive shows us example1:latest instead of scratch. Our code is making an assumption about the base image that is technically incorrect in this case.

As a comparison, let us do the same thing with our wagoodman/dive image.

$ python3 dedockify.py 4d9ce0be7689FROM wagoodman/dive:latestADD file:fe1f09249227e2da2089afb4d07e16cbf832eeb804120074acd2b8192876cd28 in /CMD ["/bin/sh"]ARG DOCKER_CLI_VERSION=RUN |1 DOCKER_CLI_VERSION=19.03.1 /bin/sh -c wget -O- https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_CLI_VERSION}.tgz |     tar -xzf - docker/docker --strip-component=1 \    &&     mv docker /usr/local/binCOPY file:8385774b036879eb290175cc42a388877142f8abf1342382c4d0496b6a659034 in /usr/local/bin/ENTRYPOINT ["/usr/local/bin/dive"]

This shows a lot more diversity compared to our example1 image. We notice the ADD directive just before the FROM directive. Our code is making the wrong assumption again. We don’t know what the ADD directive is adding. We can intuitively make the assumption, however, that we don’t know for sure what the base image is. The ADD directive could have been used to extract a local tar file into the root directory. It’s possible that it was using this method to load another base image.

Dedockify Limitation Testing

Let’s experiment by creating an example Dockerfile where we explicitly define the base image. As we did earlier, in an empty directory, run the following snippet directly from the command line.

cat > Dockerfile << EOF ; touch testfile1 testfile2 testfile3FROM ubuntu:latestRUN mkdir testdir1COPY testfile1 /testdir1RUN mkdir testdir2COPY testfile2 /testdir2RUN mkdir testdir3COPY testfile3 /testdir3EOF

Now, perform a build that tags our new image as example2. This will create a similar image as before, except instead of using scratch it will use ubuntu:latest as the base image.

$ docker build . -t example2Sending build context to Docker daemon  3.584kBStep 1/7 : FROM ubuntu:latest ---> 72300a873c2cStep 2/7 : RUN mkdir testdir1 ---> Using cache ---> 4110037ae26dStep 3/7 : COPY testfile1 /testdir1 ---> Using cache ---> e4adf6dc5677Step 4/7 : RUN mkdir testdir2 ---> Using cache ---> 22d301b39a57Step 5/7 : COPY testfile2 /testdir2 ---> Using cache ---> f60e5f378e13Step 6/7 : RUN mkdir testdir3 ---> Using cache ---> cec486378382Step 7/7 : COPY testfile3 /testdir3 ---> Using cache ---> 05651f084d67Successfully built 05651f084d67Successfully tagged example2:latest

Since we now have a slightly more complex Dockerfile to reconstruct, and we have the exact Dockerfile we used to generate this image, we can make a comparison. Let us generate the output from our Python script.

$ docker imagesREPOSITORY          TAG                 IMAGE ID            CREATED             SIZEexample2            latest              05651f084d67        2 minutes ago       64.2MBexample1            latest              374e0127c1bc        1 hour ago          0Bubuntu              latest              72300a873c2c        9 days ago          64.2MBwagoodman/dive      latest              4d9ce0be7689        3 weeks ago         83.6MB
$ python3 dedockify.py 05651f084d67FROM ubuntu:latestRUN /bin/sh -c mkdir testdir1COPY file:cc4f6e89a1bc3e3c361a1c6de5acc64d3bac297f0b99aa75af737981a19bc9d6 in /testdir1RUN /bin/sh -c mkdir testdir2COPY file:a04cdcdf5fd077a994fe5427a04f6b9a52288af02dad44bb1f8025ecf209b339 in /testdir2RUN /bin/sh -c mkdir testdir3COPY file:2ed8ccde7cd97bc95ca15f0ec24ec447484a8761fa901df6032742e8f1a2a191 in /testdir3

This correlates well with the original Dockerfile. There’s no ADD directive this time, and the FROM directive is correct. Provided that our base image is defined in the original Dockerfile, and it avoids using scratch or avoids using the ADD directive to create a base image from a tar file, we should be able to reconstruct the Dockerfile with some accuracy. We still don’t know the names of the original files that were copied, however.

Blind Freestyle Dockerfile Reconstruction

Now, let us try reverse engineering a Docker container the proper way using the tools that we’ve already discussed. The container we will use has been modified from the above examples. Our earlier Dockerfile has been modified to create example3. The image has been made functional by adding a small binary. The assembly source code is available here in the Dedockify GitHub repository. Since this image is so small, we won’t need to build or pull it. We can just copy and paste the entire container right into our Docker environment with the snippet below.

uudecode << EOF | zcat | docker loadbegin-base64 600 -H4sICMicXV4AA2V4YW1wbGUzLnRhcgDtXVtvG8cVVnp56UN/QJ/YDQokgETNzJkrgTykjgsbDSzDURMkshDM5YzFhiJVkkpiCELzH/pP+tYfkf/UsxRNXdxIspe7lqv5IJF7PTM7Z87MmY9nZxgL2qG1DkN2nkXmtTecQwYrMMfsBHgXgFuVeXLCI5c26sxdNHQsie2Nm8GYZEYp+l7g6vdim4MWBrgyBjaY4MbIjZ66hezGOJ7N/ZSyMp1M5tddd9P5qw/3noA11f+XD5998XjnybVpcMa0lNfoH67oHxiIjV4nhXjP9c/7701WC1pAY/v/+2wyvimNG+zfgLpi/0KbYv+d4KQapmpQNa0G1WZ15Kc4npMsr7JUjGGMyLUzPEXJc1RCWqFkTtoEL5TgEaLSKlmOiXHnUSlKjYsQSFacop9jnTHuDNtinP52GRss/r6pL5iM5344xum3tJWHL6rBSfVoMpuP/SHSXXTFZ5NDuuB8/28znJ5tfTqf+3jwxTwNx9Ug+9EMLxybHM9fP4jT6erg7vzlanvnCMeX5Sz2dsYRV0cejr+vBuPj0WizenCYXm0+PvQvlhn7cjI6PsTZqzNfTabfDccvPhsuc/twPJ++PJoM66I9u2Jn/Ofj4Wgl6nMfcLS8/XSzmtBmNRqOj3+sTm+h/8b2P/IvcdqvbeiX07je/uXr/h9oZor9dwF/dHQbF74R3sz/F1RfuKbLi//fAWr993846C/+J0f/aCONRR9/g/4vbXNQShb7LygoKGgTTDkP1tiEUvkgJajgnTZRA0pAplFqZ1m02hqXjc4+aaVTztx4pzywfvPxH7X1V/0/oZgu7X8XOKn8NB4M5xjnx9N6ROIPk5ZnI6y7P67aq55+uvvok+3j2XR7NIl+tD0Lw/Hgwv5q9/zEYuNslz6q/f85MJsd0CBVD4SHEDhGkM4LDIqHQN4JjU5s5NqiJguRAr3nKpgICoRhyLNiCgHQOLxhfLdN7teVMd7e4uD2AY5Gkzpv14/2VuPgegyvDA3aOATr5cJkJUs0tMsRacytE6MsGZnp/piCEMaiijHmJJOIihvNbh5WX0zh/7kqkA5od3t2QI+yFenjw4/Gk6OPe7Wqnuw++/rpzuMnu7295xdU9bzar29/X6rPyenpRZZFMMG2GGwxsStgoPhAQF8KrZ1grqZb0iR+R5Xie5zOhpPxgpbpM+hrOnUwnM0nU1LY3sm1AnnfOXCgDDPfnDM834aX9XOclXZvK/aWJf3VzrO/fvb4WW97jjPS95RXp5vXyxd9Qd0MSCnsLeQ/2Hn6dS8PRzgAIbLLTiBEJ9Ej8hRZEmCURW6CcTaD8+h18BhtdsrJIFDGmBigS6w3HPfqTNbCeO8W2SQjVBK4lW9RDOIW8hUZv+PG8XdWDOI2xWA4mYGkKvYWxQC3kO+YdMYxod5ZMcDNxQB9Zpkg38iJNymG2uxvFi005+RJwRsVQAAIkmURmaF+gwUXvAZjApistWaSJyWFoAaKOS1DFJrKR0omQSSNVvO6ABaNz20e/mITc0MOe9c1vJsVHh7NX3674CKrwXx6jKf7l6jQzar233Ld8lXzl0d1E724eFY3bsOcvx2mWd14LtttriUXQXJAZFJn5MLriMHJTA6yT44zHZjHQJ2qZM5watCpBeeevFdMdANJXUkC7ZMxjCq7TI6cbW+zUkEpqj/RMrBeWInCUuFqj3QZTxays9RrBNT+XBJDh2QwwQkfqXcAqjZK5RA5SNIJzywKSTKTMGiYjkoknwJjzgVg1vBwLilYbbwhFzszLYTTwqNgRiMEZhMYK7zztZ9gmHUZap8/0WOROCXRenUhT1Q8IlhltBbOihhRZhOtoucgwVRolD3DkqprS6bHZzlbmxMNOBJGRYVxLsmDcSGZJBkTjkdlEtW3IKj4nA0oUJHDYgP5MZQ3qjHKZCrDqJSylvwZhAtPF5PRMfJEt4I0kjvms/QqU4VVnp6O6jD3PJpaujKcLqJqaNF5TlrR5lxS5jQ2IfVRv22MAhbInjAaiNIpzkSGCAacVmAj0IPWDVoKPFCjQMXi0VX7p7eh4N8phI70kElxeg7vhJIKZYpaqOy9J9Vpck3Qkx2I2qOMIUrSQ46GKrzkqHRb8R+cF/63CzTWfyvxH8LI8vtPJyjxH/cbje1/DfEfWvDX4j8YL/bfBZbxH02rQYnZ6DBmY51obP/txH8oUX7/7QSv+LU2g0DezP/nVF+EBij+fxdY6b/FIJC6PN4s/kOqWv/F/gsKCgrag2JgEIxFl3RyIWQB2bjkDIiYFeTIUuLJy6SNNUG4wKPl3GeXEgSfWuL/BNTxv6X/bx+N9d/O+18gVen/u0Dh/+43Gtt/K+9/gWDF/jvBkv9rWg0uvf/lBSZQEqIiYUoB50E7iU4JFD4ESbcok5Kq6SWbIWjFpbFSCOmUYaxwiR1yiY3tvxX+T3JR3v/sBK8Cy+4O/yfO+L/y/lcnWOn/rvF/hf8vKCgoaBXGQgwYvEKWwNmY0euoAs/cks/mFGMgBI+BfD7jaJtrHaMnhx1ZMAF9a/M/6dL/d4HG+m9r/qfS/3eCwv/dbzS2/7bmfyr23wmW/F/TanCR/7PKBZAyeI7SuGDqd0Y9ahVYZj6nJFFqpZS2UieBDAC05HRxNBKE06nwfx3yf43tv6X5n3iJ/+0Er96ovHP8X/H/O8FK/3eN/yu//xUUFBS0iqbOekv8nzSs9P9doLH+23n/V4ky/1cnKPzf/UZj+2+H/wNd7L8TLPm/NXB2K/5vDbGEhf/riv9rbP/t8H/Ayvt/naDE/xX9L/S/mrxv/Wnc7P9f1P+C/5OyxP8WFBQUtIqmizW1Ff9X+v9u0Fj/Lc3/Z4r/3wkK/3e/0dj+W4r/K/P/dIMl/7eGNRtX/N8aYgkL/9cV/9fY/lua/0+xYv9doMT/Ff0v9L9atWL9abwV/1fi/wsKCgpaRdPJetqK/6vX/y39f/torP+W4v94+f2vExT+736jsf23FP+niv13glf8X/M5+1b8X1lL5H3i/5rafzv8n9Bl/a9OcPfW/1jyf8X/7wQr/a/WMF5/Gm/F/xX+v6CgoKBVJPK8tUkQIUjIXCXMBoJj0oPO3BtujJc86SyRWaFFDOhQ8QxKJ8uya2n9D6bK+h+doLH+25n/j/My/0cnKPzf/UZj+2/O/wmQr63/a0r8TzdY8n9Nq8FF/o+xoB1a6zBk51lkXpMADhmswByzE+BdAG5V5skJj1zaSKm4aOhYEpf4P8EE22KwxcSugIHiAwF9KbR2grlLRGCdsKrfXuUQrJe25hYlS5SRHDFpoxNDACOzDxhToCpmUcUYc5JJRMWNZjfzihdTuGsc41719NPdR59sH8+m26NJ9KPtWRiOBxf2V7vnJxYbZ7v0Ue0vScq9qt7dnh3Qo2xF+vjwo/Hk6ONezUc+2X329dOdx092e3vPq21/dLR9gKPR5Hm1X9/+iticHXih9EB4CIFjBOm8wKB4CIxMnRkbubaokRoGgd5zFUwEBcIw5FkxRbpC4/AGfrRO/gpHunchT3WGfoEtPTk93azSJH5Hqv4ep7PhZLxgm/sM+vqsItx9WnmdKr/MS9919S3Jbj+NB8M5xvnxtM61P0y6tst3uJ72+4bG/X9z/l+9Pv4DXd7/6Qa1sbWdxhuP/znXqsT/doJVY9tiGrX9X6t/Y67w/9SPlPF/J/jnw8//8sFq78Fvf7Xx640/0Hf9/xs68h9OG//+/c8//am38cHGv37+6VFdWTZ7P0ymo/TH3727jBcUFBQUNMKhHw8zzub9W/B4b4sb/H8O0lzyBemIEuX9n06wd1I9WFIdFVPOgzU2oVQ+SAkqeKdN1IASaKiPUrsFr2aNy0Znn8hNTzlz453ywBZ1iIbfz/BosutfzOrRPP7oD49GCIORryNM6iH95/WIcXGyaaTg+eCTUl1fJBsJW9+yuCRsfXPskbD1LdhRl9na3v4lYU1530vC1sdLVPun+8VP/SVMyVZnw/lkOsRZW2nc9PsvB7ja/pNBlPa/C5ysWuia61420mv4Pej0tNhcQUFBwV3GfwHMszUXAMIAAA======EOF

Running everything directly from the command line will load example3:latest. Now, let us try to recreate the Dockerfile.

$ docker imagesREPOSITORY          TAG                 IMAGE ID            CREATED             SIZEexample3            latest              059a3878de45        5 minutes ago       63B
$ python3 dedockify.py 059a3878de45FROM example3:latestWORKDIR /testdir1COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile1WORKDIR /testdir2COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile2WORKDIR /testdir3COPY file:322f9f92e3c94eaee1dc0d23758e17b798f39aea6baec8f9594b2e4ccd03e9d0 in testfile3WORKDIR /appCOPY file:b33b40f2c07ced0b9ba6377b37f666041d542205e0964bc26dc0440432d6e861 in helloENTRYPOINT ["/app/hello"]

This gives us a base Dockerfile to work from. Since example3:latest is the name of this image, we can assume from the context that it’s using scratch. Now, we need to see what files were copied into /testdir1, /testdir2, /testdir3, and /app. Let us run this image against dive to see how we will recover the missing data.

docker run --rm -it \    -v /var/run/docker.sock:/var/run/docker.sock \    wagoodman/dive:latest example3:latest

If you scroll down to the last layer, you’ll be able to see all of the missing data populate the tree on the right. Each of the directories had zero-byte files named testfile1, testfile2, and testfile3 copied to it. And in the last later, a 63-byte file was copied called hello to the /app directory.

Now, let us recover those files! There doesn’t appear to be a way to copy the files directly from the image, so we will need to create a container first.

$ docker run -td example3:latest6fdca182a128df7a76e618931c85a67e14a73adc69ad23782bc9a5dc29420a27

Now, let us copy the files we need from the container to the host using the path and filenames we recovered from Dive below.

/testdir1/testfile1/testdir2/testfile2/testdir3/testfile3/app/hello

We might first check to see if our container is still running.

$ docker psCONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES6fdca182a128        example3:latest     "/app/hello"        2 minutes ago       Up 2 minutes                            wizardly_lamport

If a container isn’t running for some reason, that’s fine. We can verify its status to see that it’s stopped.

$ docker container ls -a

We can also check the logs.

$ docker logs 6fdca182a128Hello, world!

It appears to be running a persistent Hello, world! program. Actually, in this case, the Hello, world! program wasn’t designed to be persistent. In Docker version 19.03.6, there may be a bug that’s preventing the application from terminating normally. This is acceptable for now. The container can be active or stopped; the application doesn’t need to be persistent to recover any of the data we need. A container in any state only needs to be generated from the source image for which we are extracting data.

docker cp 6fdca182a128:/testdir1/testfile1 .docker cp 6fdca182a128:/testdir2/testfile2 .docker cp 6fdca182a128:/testdir3/testfile3 .docker cp 6fdca182a128:/app/hello .

By running the recovered executable to verify its behavior, we should see the following:

$ ./helloHello, world!

With the Dockerfile we generated earlier, we can update it to include all the new details. This includes updating the FROM directive to scratch, along with all of the discovered filenames we found while exploring with Dive.

FROM scratchWORKDIR /testdir1COPY testfile1 .WORKDIR /testdir2COPY testfile2 .WORKDIR /testdir3COPY testfile3 .WORKDIR /appCOPY hello .ENTRYPOINT ["/app/hello"]

Again, combining all files in a shared folder, we’re ready to run our reverse engineered Dockerfile.

$ docker build . -t example3:recoveredSending build context to Docker daemon  4.608kBStep 1/10 : FROM scratch --->Step 2/10 : WORKDIR /testdir1 ---> Running in 5e8e47505ca6Removing intermediate container 5e8e47505ca6 ---> d30a2f002626Step 3/10 : COPY testfile1 . ---> 4ac46077a588Step 4/10 : WORKDIR /testdir2 ---> Running in 8c48189da985Removing intermediate container 8c48189da985 ---> 7c7d90bc2219Step 5/10 : COPY testfile2 . ---> 5b40d33100e1Step 6/10 : WORKDIR /testdir3 ---> Running in 4ccd634a04dbRemoving intermediate container 4ccd634a04db ---> f89fdda8f059Step 7/10 : COPY testfile3 . ---> 9542f614200dStep 8/10 : WORKDIR /app ---> Running in 7614b0fdba42Removing intermediate container 7614b0fdba42 ---> 6d686935a791Step 9/10 : COPY hello . ---> cd4baca758ddStep 10/10 : ENTRYPOINT ["/app/hello"] ---> Running in 28a1ca58b27fRemoving intermediate container 28a1ca58b27f ---> 35dfd9240a2eSuccessfully built 35dfd9240a2eSuccessfully tagged example3:recovered
$ docker run --name recovered -dt example3:recovered0f696bf500267a996339b522cf584e010434103fe82497df2c1fa58a9c548f20$ docker logs recoveredHello, world!

Now, for further verification, lets check the layers with dive again.

docker run --rm -it \    -v /var/run/docker.sock:/var/run/docker.sock \    wagoodman/dive:latest example3:recovered

This image shows the same files as the original. Comparing the two images side, by side, they both show that they match. Both show the same file sizes. And both function in exactly the same way.

Here is the original Dockerfile used to generate the original example3 image.

FROM alpine:3.9.2RUN apk add --no-cache nasmWORKDIR /appCOPY hello.s /app/hello.sRUN touch testfile && nasm -f bin -o hello hello.s && chmod +x helloFROM scratchWORKDIR /testdir1COPY --from=0 /app/testfile testfile1WORKDIR /testdir2COPY --from=0 /app/testfile testfile2WORKDIR /testdir3COPY --from=0 /app/testfile testfile3WORKDIR /appCOPY --from=0 /app/hello helloENTRYPOINT ["/app/hello"]

We can see that, while we weren’t able to reconstruct it perfectly, we were able to reconstruct approximately. There’s no way to reconstruct a Dockerfile that uses a multi-stage build like this one. The information simply isn’t available. Our only option is to reconstruct the Dockerfile of the image we actually have. If we have images from the eariler build stages, we can reproduce a Dockerfile for each of those, but in this case, all we had was the final build. But regardless, we have still successfully reproduced a useful Dockerfile from a Docker image.

Future Work

By using a similar approach as dive, we should be able to update the Dedockify source code to transgress through each of the layers automatically in order to recover all useful file information. Also, the program can be updated to be able to automatically recover files from the container and store them locally, while also automatically making appropriate updates to the Dockerfile. Finally, the program can also be updated to be able to easily infer if the base layer is using an empty scratch image, or something else. With some additional changes to the recovered Dockerfile syntax, Dedockify can potentially be updated to completely automate the reverse engineering of a Docker image into a functional Dockerfile in most cases.

Explore Gcore Container as a Service

Related articles

Bare metal vs. virtual machines: performance, cost, and use case comparison

Choosing the right type of server infrastructure is critical to how your application performs, scales, and fits your budget. For most workloads, the decision comes down to two core options: bare metal servers and virtual machines (VMs). Both can be deployed in the cloud, but they differ significantly in terms of performance, control, scalability, and cost.In this article, we break down the core differences between bare metal and virtual servers, highlight when to choose each, and explain how Gcore can help you deploy the right infrastructure for your needs. If you want to learn about either BM or VMs in detail, we’ve got articles for those: here’s the one for bare metal, and here’s a deep dive into virtual machines.Bare metal vs. virtual machines at a glanceWhen evaluating whether bare metal or virtual machines are right for your company, consider your specific workload requirements, performance priorities, and business objectives. Here’s a quick breakdown to help you decide what works best for you.FactorBare metal serversVirtual machinesPerformanceDedicated resources; ideal for high-performance workloadsShared resources; suitable for moderate or variable workloadsScalabilityOften requires manual scaling; less flexibleHighly elastic; easy to scale up or downCustomizationFull control over hardware, OS, and configurationLimited by hypervisor and provider’s environmentSecurityIsolated by default; no hypervisor layerShared environment with strong isolation protocolsCostHigher upfront cost; dedicated hardwarePay-as-you-go pricing; cost-effective for flexible workloadsBest forHPC, AI/ML, compliance-heavy workloadsStartups, dev/test, fast-scaling applicationsAll about bare metal serversA bare metal server is a single-tenant physical server rented from a cloud provider. Unlike virtual servers, the hardware is not shared with other users, giving you full access to all resources and deeper control over configurations. You get exclusive access and control over the hardware via the cloud provider, which offers the stability and security needed for high-demand applications.The benefits of bare metal serversHere are some of the business advantages of opting for a bare metal server:Maximized performance: Because they are dedicated resources, bare metal servers provide top-tier performance without sharing processing power, memory, or storage with other users. This makes them ideal for resource-intensive applications like high-performance computing (HPC), big data processing, and game hosting.Greater control: Since you have direct access to the hardware, you can customize the server to meet your specific requirements. This is especially important for businesses with complex, specialized needs that require fine-tuned configurations.High security: Bare metal servers offer a higher level of security than their alternatives due to the absence of virtualization. With no shared resources or hypervisor layer, there’s less risk of vulnerabilities that come with multi-tenant environments.Dedicated resources: Because you aren’t sharing the server with other users, all server resources are dedicated to your application so that you consistently get the performance you need.Who should use bare metal servers?Here are examples of instances where bare metal servers are the best option for a business:High-performance computing (HPC)Big data processing and analyticsResource-intensive applications, such as AI/ML workloadsGame and video streaming serversBusinesses requiring enhanced security and complianceAll about virtual machinesA virtual server (or virtual machine) runs on top of a physical server that’s been partitioned by a cloud provider using a hypervisor. This allows multiple VMs to share the same hardware while remaining isolated from each other.Unlike bare metal servers, virtual machines share the underlying hardware with other cloud provider customers. That means you’re using (and paying for) part of one server, providing cost efficiency and flexibility.The benefits of virtual machinesHere are some advantages of using a shared virtual machine:Scalability: Virtual machines are ideal for businesses that need to scale quickly and are starting at a small scale. With cloud-based virtualization, you can adjust your server resources (CPU, memory, storage) on demand to match changing workloads.Cost efficiency: You pay only for the resources you use with VMs, making them cost-effective for companies with fluctuating resource needs, as there is no need to pay for unused capacity.Faster deployment: VMs can be provisioned quickly and easily, which makes them ideal for anyone who wants to deploy new services or applications fast.Who should use virtual machines?VMs are a great fit for the following:Web hosting and application hostingDevelopment and testing environmentsRunning multiple apps with varying demandsStartups and growing businesses requiring scalabilityBusinesses seeking cost-effective, flexible solutionsWhich should you choose?There’s no one-size-fits-all answer. Your choice should depend on the needs of your workload:Choose bare metal if you need dedicated performance, low-latency access to hardware, or tighter control over security and compliance.Choose virtual servers if your priority is flexible scaling, faster deployment, and optimized cost.If your application uses GPU-based inference or AI training, check out our dedicated guide to VM vs. BM for AI workloads.Get started with Gcore BM or VMs todayAt Gcore, we provide both bare metal and virtual machine solutions, offering flexibility, performance, and reliability to meet your business needs. Gcore Bare Metal has the power and reliability needed for demanding workloads, while Gcore Virtual Machines offers customizable configurations, free egress traffic, and flexibility.Compare Gcore BM and VM pricing now

Optimize your workload: a guide to selecting the best virtual machine configuration

Virtual machines (VMs) offer the flexibility, scalability, and cost-efficiency that businesses need to optimize workloads. However, choosing the wrong setup can lead to poor performance, wasted resources, and unnecessary costs.In this guide, we’ll walk you through the essential factors to consider when selecting the best virtual machine configuration for your specific workload needs.﹟1 Understand your workload requirementsThe first step in choosing the right virtual machine configuration is understanding the nature of your workload. Workloads can range from light, everyday tasks to resource-intensive applications. When making your decision, consider the following:Compute-intensive workloads: Applications like video rendering, scientific simulations, and data analysis require a higher number of CPU cores. Opt for VMs with multiple processors or CPUs for smoother performance.Memory-intensive workloads: Databases, big data analytics, and high-performance computing (HPC) jobs often need more RAM. Choose a VM configuration that provides sufficient memory to avoid memory bottlenecks.Storage-intensive workloads: If your workload relies heavily on storage, such as file servers or applications requiring frequent read/write operations, prioritize VM configurations that offer high-speed storage options, such as SSDs or NVMe.I/O-intensive workloads: Applications that require frequent network or disk I/O, such as cloud services and distributed applications, benefit from VMs with high-bandwidth and low-latency network interfaces.﹟2 Consider VM size and scalabilityOnce you understand your workload’s requirements, the next step is to choose the right VM size. VM sizes are typically categorized by the amount of CPU, memory, and storage they offer.Start with a baseline: Select a VM configuration that offers a balanced ratio of CPU, RAM, and storage based on your workload type.Scalability: Choose a VM size that allows you to easily scale up or down as your needs change. Many cloud providers offer auto-scaling capabilities that adjust your VM’s resources based on real-time demand, providing flexibility and cost savings.Overprovisioning vs. underprovisioning: Avoid overprovisioning (allocating excessive resources) unless your workload demands peak capacity at all times, as this can lead to unnecessary costs. Similarly, underprovisioning can affect performance, so finding the right balance is essential.﹟3 Evaluate CPU and memory considerationsThe central processing unit (CPU) and memory (RAM) are the heart of a virtual machine. The configuration of both plays a significant role in performance. Workloads that need high processing power, such as video encoding, machine learning, or simulations, will benefit from VMs with multiple CPU cores. However, be mindful of CPU architecture—look for VMs that offer the latest processors (e.g., Intel Xeon, AMD EPYC) for better performance per core.It’s also important that the VM has enough memory to avoid paging, which occurs when the system uses disk space as virtual memory, significantly slowing down performance. Consider a configuration with more RAM and support for faster memory types like DDR4 for memory-heavy applications.﹟4 Assess storage performance and capacityStorage performance and capacity can significantly impact the performance of your virtual machine, especially for applications requiring large data volumes. Key considerations include:Disk type: For faster read/write operations, opt for solid-state drives (SSDs) over traditional hard disk drives (HDDs). Some cloud providers also offer NVMe storage, which can provide even greater speed for highly demanding workloads.Disk size: Choose the right size based on the amount of data you need to store and process. Over-allocating storage space might seem like a safe bet, but it can also increase costs unnecessarily. You can always resize disks later, so avoid over-allocating them upfront.IOPS and throughput: Some workloads require high input/output operations per second (IOPS). If this is a priority for your workload (e.g., databases), make sure that your VM configuration includes high IOPS storage options.﹟5 Weigh up your network requirementsWhen working with cloud-based VMs, network performance is a critical consideration. High-speed and low-latency networking can make a difference for applications such as online gaming, video conferencing, and real-time analytics.Bandwidth: Check whether the VM configuration offers the necessary bandwidth for your workload. For applications that handle large data transfers, such as cloud backup or file servers, make sure that the network interface provides high throughput.Network latency: Low latency is crucial for applications where real-time performance is key (e.g., trading systems, gaming). Choose VMs with low-latency networking options to minimize delays and improve the user experience.Network isolation and security: Check if your VM configuration provides the necessary network isolation and security features, especially when handling sensitive data or operating in multi-tenant environments.﹟6 Factor in cost considerationsWhile it’s essential that your VM has the right configuration, cost is always an important factor to consider. Cloud providers typically charge based on the resources allocated, so optimizing for cost efficiency can significantly impact your budget.Consider whether a pay-as-you-go or reserved model (which offers discounted rates in exchange for a long-term commitment) fits your usage pattern. The reserved option can provide significant savings if your workload runs continuously. You can also use monitoring tools to track your VM’s performance and resource usage over time. This data will help you make informed decisions about scaling up or down so you’re not paying for unused resources.﹟7 Evaluate security featuresSecurity is a primary concern when selecting a VM configuration, especially for workloads handling sensitive data. Consider the following:Built-in security: Look for VMs that offer integrated security features such as DDoS protection, web application firewall (WAF), and encryption.Compliance: Check that the VM configuration meets industry standards and regulations, such as GDPR, ISO 27001, and PCI DSS.Network security: Evaluate the VM's network isolation capabilities and the availability of cloud firewalls to manage incoming and outgoing traffic.﹟8 Consider geographic locationThe geographic location of your VM can impact latency and compliance. Therefore, it’s a good idea to choose VM locations that are geographically close to your end users to minimize latency and improve performance. In addition, it’s essential to select VM locations that comply with local data sovereignty laws and regulations.﹟9 Assess backup and recovery optionsBackup and recovery are critical for maintaining data integrity and availability. Look for VMs that offer automated backup solutions so that data is regularly saved. You should also evaluate disaster recovery capabilities, including the ability to quickly restore data and applications in case of failure.﹟10 Test and iterateFinally, once you've chosen a VM configuration, testing its performance under real-world conditions is essential. Most cloud providers offer performance monitoring tools that allow you to assess how well your VM is meeting your workload requirements.If you notice any performance bottlenecks, be prepared to adjust the configuration. This could involve increasing CPU cores, adding more memory, or upgrading storage. Regular testing and fine-tuning means that your VM is always optimized.Choosing a virtual machine that suits your requirementsSelecting the best virtual machine configuration is a key step toward optimizing your workloads efficiently, cost-effectively, and without unnecessary performance bottlenecks. By understanding your workload’s needs, considering factors like CPU, memory, storage, and network performance, and continuously monitoring resource usage, you can make informed decisions that lead to better outcomes and savings.Whether you're running a small application or large-scale enterprise software, the right VM configuration can significantly improve performance and cost. Gcore offers a wide range of virtual machine options that can meet your unique requirements. Our virtual machines are designed to meet diverse workload requirements, providing dedicated vCPUs, high-speed storage, and low-latency networking across 30+ global regions. You can scale compute resources on demand, benefit from free egress traffic, and enjoy flexible pricing models by paying only for the resources in use, maximizing the value of your cloud investments.Contact us to discuss your VM needs

How to get the size of a directory in Linux

Understanding how to check directory size in Linux is critical for managing storage space efficiently. Understanding this process is essential whether you’re assessing specific folder space or preventing storage issues.This comprehensive guide covers commands and tools so you can easily calculate and analyze directory sizes in a Linux environment. We will guide you step-by-step through three methods: du, ncdu, and ls -la. They’re all effective and each offers different benefits.What is a Linux directory?A Linux directory is a special type of file that functions as a container for storing files and subdirectories. It plays a key role in organizing the Linux file system by creating a hierarchical structure. This arrangement simplifies file management, making it easier to locate, access, and organize related files. Directories are fundamental components that help ensure smooth system operations by maintaining order and facilitating seamless file access in Linux environments.#1 Get Linux directory size using the du commandUsing the du command, you can easily determine a directory’s size by displaying the disk space used by files and directories. The output can be customized to be presented in human-readable formats like kilobytes (KB), megabytes (MB), or gigabytes (GB).Check the size of a specific directory in LinuxTo get the size of a specific directory, open your terminal and type the following command:du -sh /path/to/directoryIn this command, replace /path/to/directory with the actual path of the directory you want to assess. The -s flag stands for “summary” and will only display the total size of the specified directory. The -h flag makes the output human-readable, showing sizes in a more understandable format.Example: Here, we used the path /home/ubuntu/, where ubuntu is the name of our username directory. We used the du command to retrieve an output of 32K for this directory, indicating a size of 32 KB.Check the size of all directories in LinuxTo get the size of all files and directories within the current directory, use the following command:sudo du -h /path/to/directoryExample: In this instance, we again used the path /home/ubuntu/, with ubuntu representing our username directory. Using the command du -h, we obtained an output listing all files and directories within that particular path.#2 Get Linux directory size using ncduIf you’re looking for a more interactive and feature-rich approach to exploring directory sizes, consider using the ncdu (NCurses Disk Usage) tool. ncdu provides a visual representation of disk usage and allows you to navigate through directories, view size details, and identify large files with ease.For Debian or Ubuntu, use this command:sudo apt-get install ncduOnce installed, run ncdu followed by the path to the directory you want to analyze:ncdu /path/to/directoryThis will launch the ncdu interface, which shows a breakdown of file and subdirectory sizes. Use the arrow keys to navigate and explore various folders, and press q to exit the tool.Example: Here’s a sample output of using the ncdu command to analyze the home directory. Simply enter the ncdu command and press Enter. The displayed output will look something like this:#3 Get Linux directory size using 1s -1aYou can alternatively opt to use the ls command to list the files and directories within a directory. The options -l and -a modify the default behavior of ls as follows:-l (long listing format)Displays the detailed information for each file and directoryShows file permissions, the number of links, owner, group, file size, the timestamp of the last modification, and the file/directory name-a (all files)Instructs ls to include all files, including hidden files and directoriesIncludes hidden files on Linux that typically have names beginning with a . (dot)ls -la lists all files (including hidden ones) in long format, providing detailed information such as permissions, owner, group, size, and last modification time. This command is especially useful when you want to inspect file attributes or see hidden files and directories.Example: When you enter ls -la command and press Enter, you will see an output similar to this:Each line includes:File type and permissions (e.g., drwxr-xr-x):The first character indicates the file type- for a regular filed for a directoryl for a symbolic linkThe next nine characters are permissions in groups of three (rwx):r = readw = writex = executePermissions are shown for three classes of users: owner, group, and others.Number of links (e.g., 2):For regular files, this usually indicates the number of hard linksFor directories, it often reflects subdirectory links (e.g., the . and .. entries)Owner and group (e.g., user group)File size (e.g., 4096 or 1045 bytes)Modification date and time (e.g., Jan 7 09:34)File name (e.g., .bashrc, notes.txt, Documents):Files or directories that begin with a dot (.) are hidden (e.g., .bashrc)ConclusionThat’s it! You can now determine the size of a directory in Linux. Measuring directory sizes is a crucial skill for efficient storage management. Whether you choose the straightforward du command, use the visual advantages of the ncdu tool, or opt for the versatility of ls -la, this expertise enhances your ability to uphold an organized and efficient Linux environment.Looking to deploy Linux in the cloud? With Gcore Edge Cloud, you can choose from a wide range of pre-configured virtual machines suitable for Linux:Affordable shared compute resources starting from €3.2 per monthDeploy across 50+ cloud regions with dedicated servers for low-latency applicationsSecure apps and data with DDoS protection, WAF, and encryption at no additional costGet started today

How to Run Hugging Face Spaces on Gcore Inference at the Edge

Running machine learning models, especially large-scale models like GPT 3 or BERT, requires a lot of computing power and comes with a lot of latency. This makes real-time applications resource-intensive and challenging to deliver. Running ML models at the edge is a lightweight approach offering significant advantages for latency, privacy, and resource optimization.  Gcore Inference at the Edge makes it simple to deploy and manage custom models efficiently, giving you the ability to deploy and scale your favorite Hugging Face models globally in just a few clicks. In this guide, we’ll walk you through how easy it is to harness the power of Gcore’s edge AI infrastructure to deploy a Hugging Face Space model. Whether you’re developing NLP solutions or cutting-edge computer vision applications, deploying at the edge has never been simpler—or more powerful. Step 1: Log In to the Gcore Customer PortalGo to gcore.com and log in to the Gcore Customer Portal. If you don’t yet have an account, go ahead and create one—it’s free. Step 2: Go to Inference at the EdgeIn the Gcore Customer Portal, click Inference at the Edge from the left navigation menu. Then click Deploy custom model. Step 3: Choose a Hugging Face ModelOpen huggingface.com and browse the available models. Select the model you want to deploy. Navigate to the corresponding Hugging Face Space for the model. Click on Files in the Space and locate the Docker option. Copy the Docker image link and startup command from Hugging Face Space. Step 4: Deploy the Model on GcoreReturn to the Gcore Customer Portal deployment page and enter the following details: Model image URL: registry.hf.space/ethux-mistral-pixtral-demo:latest Startup command: python app.py Container port: 7860 Configure the pod as follows: GPU-optimized: 1x L40S vCPUs: 16 RAM: 232GiB For optimal performance, choose any available region for routing placement. Name your deployment and click Deploy.Step 5: Interact with Your ModelOnce the model is up and running, you’ll be provided with an endpoint. You can now interact with the model via this endpoint to test and use your deployed model at the edge.Powerful, Simple AI Deployment with GcoreGcore Inference at the Edge is the future of AI deployment, combining the ease of Hugging Face integration with the robust infrastructure needed for real-time, scalable, and global solutions. By leveraging edge computing, you can optimize model performance and simultaneously futureproof your business in a world that increasingly demands fast, secure, and localized AI applications. Deploying models to the edge allows you to capitalize on real-time insights, improve customer experiences, and outpace your competitors. Whether you’re leading a team of developers or spearheading a new AI initiative, Gcore Inference at the Edge offers the tools you need to innovate at the speed of tomorrow. Explore Gcore Inference at the Edge

10 Common Web Performance Mistakes and How to Overcome Them

Web performance mistakes can carry a high price, resulting in websites that yield low conversion rates, high bounce rates, and poor sales. In this article, we dig into the top 10 mistakes you should avoid to boost your website performance.1. Slow or Unreliable Web HostYour site speed begins with your web host, which provides the server infrastructure and resources for your website. This includes the VMs and other infrastructure where your code and media files reside. Three common host-related problems are as follows:Server location: The further away your server is from your users, the slower the site speed and the poorer the experience for your website visitors. (More on this under point 7.)Shared hosting: Shared hosting solutions share server resources among multiple websites, leading to slow load times and spotty connections during peak times due to heavy usage. Shared VMs can also impact your website’s performance due to increased network traffic and resource contention.VPS hosting: Bandwidth limitations can be a significant issue with VPS hosting. A limited bandwidth package can cause your site speed to decrease during high-traffic periods, resulting in a sluggish user experience.Correct for server and VM hosting issues by choosing a provider with servers located closer to your user base and provisioning sufficient computational resources, like Gcore CDN. Use virtual dedicated servers (VDS/VPS) rather than shared hosting to avoid network traffic from other websites affecting your site’s performance. If you already use a VPS, consider upgrading your hosting plan to increase server resources and improve UX. For enterprises, dedicated servers may be more suitable.2. Inefficient Code, Libraries, and FrameworksPoor-quality code and inefficient frameworks can increase the size of web pages, consume too many resources, and slow down page load times. Code quality is often affected by syntax, semantics, and logic errors. Correct these issues by writing clean and simple code.Errors or inefficiencies introduced by developers can impact site performance, such as excessive API calls or memory overuse. Prevent these issues by using TypeScript, console.log, or built-in browser debuggers during development. For bugs in already shipped code, utilize logging and debugging tools like the GNU debugger or WinDbg to identify and resolve problems.Improving code quality also involves minimizing the use of large libraries and frameworks. While frontend frameworks like React, Vue, and Angular.js are popular for accelerating development, they often include extensive JavaScript and prebuilt components that can bloat your website’s codebase. To optimize for speed, carefully analyze your use case to determine if a framework is necessary. If a static page suffices, avoid using a framework altogether. If a framework is needed, select libraries that allow you to link only the required components.3. Unoptimized Code Files and FontsEven high-quality code needs optimization before shipping. Unoptimized JavaScript, HTML, and CSS files can increase page weight and necessitate multiple HTTP requests, especially if JavaScript files are executed individually.To optimize code, two effective techniques are minification and bundling.Minification removes redundant libraries, code, comments, unnecessary characters (e.g., commas and dots), and formatting to reduce your source code’s size. It also shortens variable and function names, further decreasing file size. Tools for minification include UglifyJS for JavaScript, CSSNano for CSS, and HTMLminifier for HTML.Bundling groups multiple files into one, reducing the number of HTTP requests and speeding up site load times. Popular bundling tools include Rollup, Webpack, and Parcel.File compression using GZIP or Brotli can also reduce the weight of HTTP requests and responses before they reach users’ browsers. Enable your chosen compression technique on your server only after checking that your server provider supports it.4. Unoptimized Images and VideosSome websites are slowed down by large media files. Upload only essential media files to your site. For images, compress or resize them using tools like TinyPNG and Compressor.io. Convert images from JPEG, PNG, and GIF to WebP and AVIF formats to maintain quality while reducing file size. This is especially beneficial in industries like e-commerce and travel, where multiple images boost conversion rates. Use dynamic image optimization services like Gcore Image Stack for efficient processing and delivery. For pages with multiple images, use CSS sprites to group them, reducing the number of HTTP requests and speeding up load times.When adding video files, use lite embeds for external links. Standard embed code, like YouTube’s, is heavy and can slow down your pages. Lite embeds load only thumbnail images initially, and the full video loads when users click the thumbnail, improving page speed.5. No Lazy LoadingLazy loading delays the rendering of heavy content like images and JavaScript files until the user needs it, contrasting with “eager” loading, which loads everything at once and slows down site load times. Even with optimized images and code, lazy loading can further enhance site speed through a process called “timing.”Image timing uses the HTML loading attribute in an image tag or frameworks like Angular or React to load images in response to user actions. The browser only requests images when the user interacts with specific features, triggering the download.JavaScript timing controls when certain code loads. If JavaScript doesn’t need to run until the entire page has rendered, use the defer attribute to delay its execution. If JavaScript can load at any time without affecting functionality, load it asynchronously with the async attribute.6. Heavy or Redundant External Widgets and PluginsWidgets and plugins are placed in designated frontend and backend locations to extend website functionality. Examples include Google review widgets that publish product reviews on your website and Facebook plugins that connect your website to your Facebook Page. As your website evolves, more plugins are typically installed, and sometimes website admins forget to remove those that are no longer required.Over time, heavy and unused plugins can consume substantial resources, slowing down your website unnecessarily. Widgets may also contain heavy HTML, CSS, or JavaScript files that hinder web performance.Remove unnecessary plugins and widgets, particularly those that make cURL calls, HTTP requests, or generate excessive database queries. Avoid plugins that load heavy scripts and styles or come from unreliable sources, as they may contain malicious code and degrade website performance.7. Network IssuesYour server’s physical location significantly impacts site speed for end users. For example, if your server is in the UK and your users are in China, they’ll experience high latency due to the distance and DNS resolution time. The greater the distance between the server and the user, the more network hops are required, increasing latency and slowing down site load times.DNS resolution plays a crucial role in this process. Your authoritative DNS provider resolves your domain name to your IP address. If the provider’s server is too far from the user, DNS resolution will be slow, giving visitors a poor first impression.To optimize content delivery and reduce latency, consider integrating a content delivery network (CDN) with your server-side code. A CDN stores copies of your static assets (e.g., container images, JavaScript, CSS, and HTML files) on geographically distributed servers. This distribution ensures that users can access your content from a server closer to their location, significantly improving site speed and performance.8. No CachingWithout caching, your website has to fetch data from the origin server every time a user requests. This increases the load time because the origin server is another physical hop that data has to travel.Caching helps solve this problem by serving pre-saved copies of your website. Copies of your web files are stored on distributed CDN servers, meaning they’re available physically closer to website viewers, resulting in quicker load times.An additional type of caching, DNS caching, temporarily stores DNS records in DNS resolvers. This allows for faster domain name resolution and accelerates the initial connection to a website.9. Excessive RedirectsWebsite redirects send users from one URL to another, often resulting in increased HTTP requests to servers. These additional requests can potentially crash servers or cause resource consumption issues. To prevent this, use tools like Screaming Frog to scan your website for redirects and reduce them to only those that are absolutely necessary. Additionally, limit each redirect to making no more than one request for a .css file and one for a .js file.10. Lack of Mobile OptimizationForgetting to optimize for mobile can harm your website’s performance. Mobile-first websites optimize for speed and UX. Better UX leads to happier customers and increased sales.Optimizing for mobile starts with understanding the CPU, bandwidth, and memory limitations of mobile devices compared to desktops. Sites with excessively heavy files will load slowly on mobiles. Writing mobile-first code, using mobile devices or emulators for building and testing, and enhancing UX for various mobile device types—such as those with larger screens or higher capacity—can go a long way to optimizing for mobile.How Can Gcore Help Prevent These Web Performance Mistakes?If you’re unsure where to start in correcting or preventing web performance mistakes, don’t worry—you don’t have to do it alone. Gcore offers a comprehensive suite of solutions designed to enhance your web performance and deliver the best user experience for your visitors:Powerful VMs: Fast web hosting with a wide range of virtual machines.Managed DNS: Hosting your DNS zones and ensuring quick DNS resolution with our fast Managed DNS.CDN: Accelerate both static and dynamic components of your website for global audiences.With robust infrastructure from Gcore, you can ensure optimal performance and a seamless experience for all your web visitors. Keep your website infrastructure in one place for a simplified website management experience.Need help getting started? Contact us for a personalized consultation and discover how Gcore can supercharge your website performance.Get in touch to boost your website

How to Choose Between Bare Metal GPUs and Virtual GPUs for AI Workloads

Choosing the right GPU type for your AI project can make a huge difference in cost and business outcomes. The first consideration is often whether you need a bare metal or virtual GPU. With a bare metal GPU, you get a physical server with an entire GPU chip (or chips) installed that is completely dedicated to the workloads you run on the server, whereas a virtual GPU means you share GPU resources with other virtual machines.Read on to discover the key differences between bare metal GPUs and virtual GPUs, including performance and scalability, to help you make an informed decision.The Difference Between Bare Metal and Virtual GPUsThe main difference between bare metal GPUs and virtual GPUs is how they use physical GPU resources. With a bare metal GPU, you get a physical server with an entire GPU chip (or chips) installed that is completely dedicated to the workloads you run on the server. There is no hypervisor layer between the operating system (OS) and the hardware, so applications use the GPU resources directly.With a virtual GPU, you get a virtual machine (VM) and uses one of two types of GPU virtualization, depending on your or a cloud provider’s capabilities:An entire, dedicated GPU used by a VM, also known as a passthrough GPUA shared GPU used by multiple VMs, also known as a vGPUAlthough a passthrough GPU VM gets the entire GPU, applications access it through the layers of a guest OS and hypervisor. Also, unlike a bare metal GPU instance, other critical VM resources that applications use, such as RAM, storage, and networking, are also virtualized.The difference between running applications with bare metal and virtual GPUsThese architectural features affect the following key aspects:Performance and latency: Applications running on a VM with a virtual GPU, especially vGPU, will have lower processing power and higher latency for the same GPU characteristics than those running on bare metal with a physical GPU.Cost: As a result of the above, bare metal GPUs are more expensive than virtual GPUs.Scalability: Virtual GPUs are easier to scale than bare metal GPUs because scaling the latter requires a new physical server. In contrast, a new GPU instance can be provisioned in the cloud in minutes or even seconds.Control over GPU hardware: This can be critical for certain configurations and optimizations. For example, when training massive deep learning models with a billion parameters, total control means the ability to optimize performance optimization—and that can have a big impact on training efficiency for massive datasets.Resource utilization: GPU virtualization can lead to underutilization if the tasks being performed don’t need the full power of the GPU, resulting in wasted resources.Below is a table summarizing the benefits and drawbacks of each approach: Bare metal GPUVirtual GPUPassthrough GPUvGPUBenefitsDedicated GPU resourcesHigh performance for demanding AI workloadsLower costSimple scalabilitySuitable for occasional or variable workloadsLowest costSimple scalabilitySuitable for occasional or variable workloadsDrawbacksHigh cost compared to virtual GPUsLess flexible and scalable than virtual GPUsLow performanceNot suitable for demanding AI workloadsLowest performanceNot suitable for demanding AI workloadsShould You Use Bare Metal or Virtual GPUs?Bare metal GPUs and virtual GPUs are typically used for different types of workloads. Your choice will depend on what AI tasks you’re looking to perform.Bare metal GPUs are better suited for compute-intensive AI workloads that require maximum performance and speed, such as training large language models. They are also a good choice for workloads that must run 24/7 without interruption, such as some production AI inference services. Finally, bare metal GPUs are preferred for real-time AI tasks, such as robotic surgery or high-frequency trading analytics.Virtual GPUs are a more suitable choice for the early stages of AI/ML and iteration on AI models, where flexibility and cost-effectiveness are more important than top performance. Workloads with variable or unpredictable resource requirements can also run on this type of GPU, such as training and fine-tuning small models or AI inference tasks that are not sensitive to latency and performance. Virtual GPUs are also great for occasional, short-term, and collaborative AI/ML projects that don’t require dedicated hardware—for example, an academic collaboration that includes multiple institutions.To choose the right type of GPU, consider these three factors:Performance requirements. Is the raw GPU speed critical for your AI workloads? If so, bare metal GPUs are a superior choice.Scalability and flexibility. Do you need GPUs that can easily scale up and down to handle dynamic workloads? If yes, opt for virtual GPUs.Budget. Depending on the cloud provider, bare metal GPU servers can be more expensive than virtual GPU instances. Virtual GPUs typically offer more flexible pricing, which may be appropriate for occasional or variable workloads.Your final choice between bare metal GPUs and virtual GPUs depends on the specific requirements of the AI/ML project, including performance needs, scalability requirements, workload types, and budget constraints. Evaluating these factors can help determine the most appropriate GPU option.Choose Gcore for Best-in-Class AI GPUsGcore offers bare metal servers with NVIDIA H100, A100, and L40S GPUs. Using the 3.2 Tbps InfiniBand interface, you can combine H100 or A100 servers into scalable GPU clusters for training and tuning massive ML models or for high-performance computing (HPC).If you are looking for a scalable and low-latency solution for global AI inference, explore Gcore Inference at the Edge. It especially benefits latency-sensitive, real-time applications, such as generative AI and object recognition.Discover Gcore bare metal GPUs

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.