Radar has landed - discover the latest DDoS attack trends. Get ahead, stay protected.Get the report
Under attack?

Products

Solutions

Resources

Partners

Why Gcore

  1. Home
  2. Developers
  3. Raspberry Pi Cluster Emulation With Docker Compose

Raspberry Pi Cluster Emulation With Docker Compose

  • By Gcore
  • April 1, 2023
  • 13 min read
Raspberry Pi Cluster Emulation With Docker Compose

TL;DR

This guide discusses everything needed to build a simple, scalable, and fully binary compatible Raspberry Pi cluster using QEMU, Docker, Docker Compose, and Ansible.

Introduction

The Raspberry Pi is no longer just a low-cost platform for students to learn computing, it’s now a legitimate research and development platform that’s being used for IoT, networking, distributed systems, and software development. It’s even being used administratively in production environments.

Not long after the first Raspberry Pi was released in 2012, several set out to build them into low-cost clusters, often for their research and testing purposes. Interns at DataStax built a multi-datacenter, 32 nodes Cassenda fault-tolerance demo, complete with a big red button to simulate the failure of an entire datacenter. David Guill built a 40-node Raspberry Pi Cluster that was intended to be part of his MSCE thesis. Balena, built “The Beast“, a 120 node Raspberry Pi cluster, for scaled testing of their online platform. And on the extreme end of the spectrum, Oracle built a 1060 node Raspberry Pi Cluster, which they introduced at Oracle OpenWorld 2019.

Innovation with the Raspberry Pi continues as they are turned into everything from wi-fi extenders, security cameras, and even bigger clusters. While the main value of these clusters is inherent in their size and low cost, their popularity makes them an increasingly common development platform. Since the Raspberry Pi uses an ARM processor, this can make development problematic for those of us who work exclusively in the cloud. While commercial solutions exist, we will be building our own emulated cluster using a fully open source stack hosted on Google Compute Engine.

Use Cases

Other than learning from the experience, Dockerizing an emulated Raspberry Pi enables us to do three things. One, it turns into software that would otherwise be a hardware-only device that nobody has to remember to carry around (I’m always losing the peripheral cables). Two, it enables Docker to do for the Pi what Docker does best for everything else: it makes software portable, easy to manage, and easy to replicate. And three, it takes up no physical space. If we can build one Raspberry Pi with Docker, we can build many. If we can build many, we can network them all together. While we may encounter some limitations, this build will emulate a cluster of Raspberry Pi 1s that’s logically equivalent to a simple, multi-node physical cluster.

Emulated Hardware Architecture

While technically not identical, the emulation software we will be using, QEMU, provides an ARM-Versatile architecture that’s roughly compatible with what is found on a Raspberry Pi 1. Some modifications to the kernel are necessary in order for it to work properly with Raspbian, but for our purposes, it’s one of the more stable open source solutions available.

pi@raspberrypi:~$ cat /proc/cpuinfo processor       : 0model name      : ARMv6-compatible processor rev 7 (v6l)BogoMIPS        : 577.53Features        : half thumb fastmult vfp edsp java tlsCPU implementer : 0x41CPU architecture: 7CPU variant     : 0x0CPU part        : 0xb76CPU revision    : 7Hardware        : ARM-Versatile (Device Tree Support)Revision        : 0000Serial          : 0000000000000000

Compared to a Physical Raspberry Pi 1, they are nearly identical:

pi@raspberrypi:~ $ cat /proc/cpuinfoprocessor	: 0model name	: ARMv6-compatible processor rev 7 (v6l)BogoMIPS	: 697.95Features	: half thumb fastmult vfp edsp java tls CPU implementer	: 0x41CPU architecture: 7CPU variant	: 0x0CPU part	: 0xb76CPU revision	: 7Hardware	: BCM2835Revision	: 000dSerial		: 000000003d9a54c5

Background

What is QEMU?

QEMU is a processor emulator. It supports a number of different processors, but the only thing that we’re interested in is something that can run Raspberry Pi images natively without lot of difficulties. In this case, we’re going to be using QEMU 4.2.0, which supports an ARM11 instruction set that’s compatible with the Broadcom BCM2835 (ARM1176JZFS) chip found on the Raspberry Pi 1 and Zero. We will use ARM1176 support on QEMU, which will allow us to more or less emulate a Raspberry Pi 1. I say more or less because we will still need to use a customized Raspbian kernel in order to boot on the emulated hardware. QEMU support for the Pi is still in development, so our approach to getting it to work here is just a clever hack that will by no means be optimal or efficient in terms of CPU utilization.

QEMU Features

QEMU supports many of the same features found in Docker, however, it can run full software emulation without a host kernel driver. This means that it can run inside Docker, or any other virtual machine, without host virtualization support. The QEMU feature list is extensive, and the learning curve is steep. However, the primary feature that we will be focused on for this build is host port forwarding so that data can be passed to the host.

Dockerized QEMU

One of Docker’s strengths is that it doesn’t handle full-fledged virtualization, but instead relies on the architecture of the host system. Since our host system will be running an Intel processor, we can’t expect Docker to handle ARM operations on its own. So, we will be placing QEMU inside a Docker container. Since Docker is designed to run software at near-native performance, the operational efficiency challenge will be with QEMU itself. QEMU, on the other hand, supports emulations of a machine’s architecture completely with software. The advantage this has is that it can run inside any virtualized system or container, independent of its system architecture. If patient, we could even run a Dockerized Raspberry Pi container inside another Dockerized Raspberry Pi container. The drawback to QEMU is that it has comparatively poor performance compared to other types of virtualization. But we can benefit from the best of both worlds by leveraging QEMU’s ARM emulation while depending on Docker for everything else.

Raspbian

Based on Debian, Raspbian is a popular and well supported operating system for the Raspberry Pi, and is one of the most often recommended for the platform. The community is very active and well managed.

Physical Raspberry Pi Speed Comparison

The following tests are intended as a baseline for comparing our virtualized systems. Since we will be emulating a single-core, these tests are only single-core, single thread, regardless of how many physical cores are incorporated into the architecture.

Raspberry Pi 1 2011,12

Test execution summary:    total time:                          330.5514s    total number of events:              10000    total time taken by event execution: 330.5002    per-request statistics:         min:                                 32.92ms         avg:                                 33.05ms         max:                                 40.94ms         approx.  95 percentile:              33.24msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   330.5002/0.00

Raspberry Pi 1 A+ V1.1 2014

Test execution summary:    total time:                          328.7505s    total number of events:              10000    total time taken by event execution: 328.6931    per-request statistics:         min:                                 32.71ms         avg:                                 32.87ms         max:                                 78.93ms         approx.  95 percentile:              33.03msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   328.6931/0.00

Raspberry Pi Zero W v1.1 2017

Test execution summary:    total time:                          228.2025s    total number of events:              10000    total time taken by event execution: 228.1688    per-request statistics:         min:                                 22.76ms         avg:                                 22.82ms         max:                                 35.29ms         approx.  95 percentile:              22.94msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   228.1688/0.00

Raspberry Pi 2 Model B v1.1 2014

Test execution summary:    total time:                          224.9052s    total number of events:              10000    total time taken by event execution: 224.8738    per-request statistics:         min:                                 22.20ms         avg:                                 22.49ms         max:                                 32.85ms         approx.  95 percentile:              22.81msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   224.8738/0.00

Raspberry Pi 3 Model B v1.2 2015

Test execution summary:    total time:                          139.6140s    total number of events:              10000    total time taken by event execution: 139.6087    per-request statistics:         min:                                 13.94ms         avg:                                 13.96ms         max:                                 34.06ms         approx.  95 percentile:              13.96msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   139.6087/0.00

Raspberry Pi 4 B 2018

Test execution summary:    total time:                          92.6405s    total number of events:              10000    total time taken by event execution: 92.6338    per-request statistics:         min:                                  9.22ms         avg:                                  9.26ms         max:                                 23.50ms         approx.  95 percentile:               9.27msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   92.6338/0.00

Project Requirements

Single Host Specifications

Historically, QEMU has been single-threaded, emulating all cores of a system’s architecture on a single CPU. While that’s no longer the case, we are still going to be emulating a single core Raspberry Pi. We will do some benchmarks later to compare how different CPU limits on each node impacts performance. But for now, we will use one CPU per single-core node. Since QEMU has the potential to use a lot of CPU resources due to its inherent inefficiency, our initial three-node cluster will start with a baseline of at least one CPU per node, leaving one CPU dedicated to the host to avoid performance problems. The VM specs selected for this task are as follows.

Cloud Provider: Google Cloud PlatformInstance Type: n1-standard-4CPUs: 4Memory: 15GBDisk: 100GBOperating System: Ubuntu 18.04 LTS

Docker

Installed on the host, we’re also using the default version of Docker that is available on the default apt repository for Ubuntu 18.04 LTS.

# docker -vDocker version 18.09.7, build 2d0083d

Docker Compose

# docker-compose -vdocker-compose version 1.25.0, build 0a186604

Docker Hub Ubuntu Image

18.04, bionic-20200112, bionic, latest

QEMU

Installed inside the Docker container, we will be using the following version of QEMU for ARM:

# qemu-system-arm --versionQEMU emulator version 4.2.0Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

QEMU Customized Kernel for Raspbian

Loaded from QEMU inside Docker, we will use Dhruv Vyas‘s compiled kernel for Raspbian, which has been modified to be usable with QEMU.

Raspbian Lite Image

Also booted from QEMU, we will use an unmodified version of Raspbian Lite from 9/30/2019.

Expect (Tcl/Tk)

Installed on the Docker container is the following version of Expect:

# expect -vexpect version 5.45.4

ssh/sshd

sshd will need to be enable on each Raspbian node, and ssh should be enabled on the host.

Ansible

The following version of Ansible is also being used, along with its other dependencies:

# ansible --versionansible 2.5.1  config file = /etc/ansible/ansible.cfg  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']  ansible python module location = /usr/lib/python2.7/dist-packages/ansible  executable location = /usr/bin/ansible  python version = 2.7.17 (default, Nov  7 2019, 10:07:09) [GCC 7.4.0]

Building the Docker Images

QEMU Build Container

We will compile QEMU 4.2.0 from source. It will need all the supporting build tools, so to keep our app container as small as possible, we will create a separate build container for the QEMU build using a minimal version of Ubuntu 18.04 from Docker Hub.

QEMU App Container

Once the QEMU is compiled from source, we will transfer it to the app container. Also from Docker Hub, we will use the same minimal version of Ubuntu 18.04 to host the QEMU binary.

Docker Configuration

The Dockerfile

We will be using the following Dockerfile, which may also be found updated in this guide’s accompanying pidoc repository on Github. Each code snippet below makes up a segment of the Dockerfile. Thanks goes to Luke Child‘s for his work on dockerpi.

Build stage for qemu-system-arm:

FROM ubuntu AS qemu-system-arm-builderARG QEMU_VERSION=4.2.0ENV QEMU_TARBALL="qemu-${QEMU_VERSION}.tar.xz"WORKDIR /qemuRUN apt-get update && \    apt-get -y install \                       wget \                       gpg \                       pkg-config \                       python \                       build-essential \                       libglib2.0-dev \                       libpixman-1-dev \                       libfdt-dev \                       zlib1g-dev \                       flex \                       bison

Download source.

RUN wget "https://download.qemu.org/${QEMU_TARBALL}"RUN # Verify signatures...RUN wget "https://download.qemu.org/${QEMU_TARBALL}.sig"RUN gpg --keyserver keyserver.ubuntu.com --recv-keys CEACC9E15534EBABB82D3FA03353C9CEF108B584RUN gpg --verify "${QEMU_TARBALL}.sig" "${QEMU_TARBALL}"

Extract tarball.

RUN tar xvf "${QEMU_TARBALL}"

Build source.

RUN "qemu-${QEMU_VERSION}/configure" --static --target-list=arm-softmmuRUN make -j$(nproc)RUN strip "arm-softmmu/qemu-system-arm"

Build the intermediary pidoc VM app image.

FROM ubuntu as pidoc-vmARG RPI_KERNEL_URL="https://github.com/dhruvvyas90/qemu-rpi-kernel/archive/afe411f2c9b04730bcc6b2168cdc9adca224227c.zip"ARG RPI_KERNEL_CHECKSUM="295a22f1cd49ab51b9e7192103ee7c917624b063cc5ca2e11434164638aad5f4"

Transfer binary from build container to app container.

COPY --from=qemu-system-arm-builder /qemu/arm-softmmu/qemu-system-arm /usr/local/bin/qemu-system-arm

Download modified kernel and install.

ADD $RPI_KERNEL_URL /tmp/qemu-rpi-kernel.zipRUN apt-get update && \    apt-get -y install \                        unzip \                        expectRUN cd /tmp && \    echo "$RPI_KERNEL_CHECKSUM  qemu-rpi-kernel.zip" | sha256sum -c && \    unzip qemu-rpi-kernel.zip && \    mkdir -p /root/qemu-rpi-kernel && \    cp qemu-rpi-kernel-*/kernel-qemu-4.19.50-buster /root/qemu-rpi-kernel/ && \    cp qemu-rpi-kernel-*/versatile-pb.dtb /root/qemu-rpi-kernel/ && \    rm -rf /tmp/*VOLUME /sdcard

Then we copy the entry point script from the host’s main directory.

ADD ./entrypoint.sh /entrypoint.shENTRYPOINT ["./entrypoint.sh"]

Build the final app pidoc image with the Raspbian Lite filesystem loaded.

FROM pidoc-vm as pidocARG FILESYSTEM_IMAGE_URL="http://downloads.raspberrypi.org/raspbian_lite/images/raspbian_lite-2019-09-30/2019-09-26-raspbian-buster-lite.zip"ARG FILESYSTEM_IMAGE_CHECKSUM="a50237c2f718bd8d806b96df5b9d2174ce8b789eda1f03434ed2213bbca6c6ff"ADD $FILESYSTEM_IMAGE_URL /filesystem.zipADD pi_ssh_enable.exp /pi_ssh_enable.expRUN echo "$FILESYSTEM_IMAGE_CHECKSUM  /filesystem.zip" | sha256sum -c

The entrypoint.sh File

First, the script determines if the filesystem has been downloaded or not, and if not, it downloads and decompresses it.

#!/bin/shraspi_fs_init() {  image_path="/sdcard/filesystem.img"  zip_path="/filesystem.zip"    if [ ! -e $image_path ]; then    echo "No filesystem detected at ${image_path}!"    if [ -e $zip_path ]; then        echo "Extracting fresh filesystem..."        unzip $zip_path        mv *.img $image_path        rm $zip_path    else      exit 1    fi  fi}

The script then checks for an empty raspi-init file, which serves as a marker to determine if Expect has been launched previously to enable ssh on Raspbian.

if [ ! -e /raspi-init ]; then  touch /raspi-init  raspi_fs_init  echo "Initiating Expect..."  /usr/bin/expect /pi_ssh_enable.exp `hostname -I`  echo "Expect Ended..."

If Expect has already been previously enabled, then we only need to launch QEMU, without Expect. Note that we are forwarding port 22 on Raspbian to port 2222 inside the Docker container.

else  /usr/local/bin/qemu-system-arm \        --machine versatilepb \        --cpu arm1176 \        --m 256M \        --hda /sdcard/filesystem.img \        --net nic \        --net user,hostfwd=tcp:`hostname -I`:2222-:22 \        --dtb /root/qemu-rpi-kernel/versatile-pb.dtb \        --kernel /root/qemu-rpi-kernel/kernel-qemu-4.19.50-buster \        --append "root=/dev/sda2 panic=1" \        --no-reboot \        --display none \        --serial mon:stdiofi

Enable SSHD on Raspbian (Expect Tcl/Tk Method)

QEMU doesn’t have a straightforward method for running configuration scripts on boot. And because Raspbian doesn’t come with SSH enabled by default, we will have to turn it on ourselves. Our options are to do it manually or to use some sort of scripting tool that can interact with stdio. Another option is to customize the Raspbian image before installation. This would have to be done on the host, however, as Docker restricts the mounting of new filesystems. In any case, to make this build the most portable and host independent, the most straightforward for our purposes will be to use an Expect script, and have it copied into our Docker image on build.

The pi_ssh_enable.exp File

Since an unmodified Raspbian image has no accessible ports by default, we will use Expect to interface with stdio in QEMU, log in with a default username and password, and enable the sshd listener.

#!/usr/bin/expect -fset ipaddr [lindex $argv 0]set timeout -1spawn /usr/local/bin/qemu-system-arm \  --machine versatilepb \  --cpu arm1176 \  --m 256M \  --hda /sdcard/filesystem.img \  --net nic \  --net user,hostfwd=tcp:$ipaddr:2222-:22 \  --dtb /root/qemu-rpi-kernel/versatile-pb.dtb \  --kernel /root/qemu-rpi-kernel/kernel-qemu-4.19.50-buster \  --append "root=/dev/sda2 panic=1" \  --no-reboot \  --display none \  --serial mon:stdioexpect "raspberrypi login:"send -- "pi\r"expect "Password:"send -- "raspberry\r"expect "pi@raspberrypi:"send -- "sudo systemctl enable ssh\r"expect "pi@raspberrypi:"send -- "sudo systemctl start ssh\r"expect "pi@raspberrypi:"expect eof

Build Image

In the folder with the Dockerfile, we will be building our two containers. The first will be our build container that includes all the dependencies for compiling QEMU, and the other will be our app container for running QEMU.

docker build -t pidoc .

Network Forwarding and Troubleshooting

Once the build is complete, bring it up detached and follow the logs.

docker run -itd --name testnode pidocdocker logs testnode -f

Raspbian will download and decompressed automatically, and QEMU should begin booting from the image.

Once Raspbian is fully booted, Expect should automatically enable sshd. Log into the docker container and test that SSH is reachable from inside the container on port 2222.

# docker exec -it testnode bashroot@d4abc2f655e6:/# hostname -I172.17.0.3root@d4abc2f655e6:/# cat < /dev/tcp/172.17.0.3/2222SSH-2.0-OpenSSH_7.9p1 Raspbian-10

Cancel out and kill the container and remove the volume.

root@d4abc2f655e6:/# exitexit# docker kill testnodetestnode# docker container rm testnodetestnode

Testing the Docker Container

Start/Test Container

We will need to start the container for testing. This is primarily to gain some intuition about the performance of QEMU so that we can better make design decisions regarding our cluster. The system should come up clean with maybe a few benign warnings related to differences between the somewhat more generalized emulated hardware and the expected physical raspberry Pi hardware. I found it necessary to make sure port forwarding was working properly between QEMU and the Docker image so that I could further verify that port forwarding between the Docker image and host was working properly. Our first goal is to double forward SSH so that QEMU is accessible directly from the host.

docker run -itd -p 127.0.0.1:2222:2222 --name testnode pidocdocker logs testnode -f

Once the system again comes online, test for sshd on port 2222 of the host by using ssh to log into Raspbian:

# ssh pi@localhost -p 2222The authenticity of host '[localhost]:2222 ([127.0.0.1]:2222)' can't be established.ECDSA key fingerprint is SHA256:N0oRF23lpDOFjlgYAbml+4v2xnYdyrTmBgaNUjpxnFM.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added '[localhost]:2222' (ECDSA) to the list of known hosts.pi@localhost's password:Linux raspberrypi 4.19.50+ #1 Tue Nov 26 01:49:16 CET 2019 armv6lThe programs included with the Debian GNU/Linux system are free software;the exact distribution terms for each program are described in theindividual files in /usr/share/doc/*/copyright.Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extentpermitted by applicable law.Last login: Tue Jan 21 12:24:59 2020SSH is enabled and the default password for the 'pi' user has not been changed.This is a security risk - please login as the 'pi' user and type 'passwd' to set a new password.pi@raspberrypi:~ $

Testing Fractional CPU Utilization

To run this cluster, we’re using a GCP n1-standard-4 instance (a 4×15) running Ubuntu 18.04 LTS. But we now notice how inefficient QEMU is once Raspbian begins doing anything. Multiple Raspberry Pi instances might stack fine if their idle, but if we want to keep the system viable, we will need to restrict CPU utilization on each instance, or else the system could be rendered unusable once more than a few nodes are put under load. Fortunately, Docker can handle this for us. We have 15GB of ram on this instance, so let’s see what happens if we are slightly more ambitious and squeezed 6 Raspberry Pi containers onto our VM. We will have a whole core left for the host to manage other tasks without much risk of a failure. We can scale this at some point later with Docker Compose.

We will run two test containers at 50% and 100% for benchmark testing.

docker run -itd --cpus="0.50" -p 127.0.0.1:2250:2222 --name pidoc_50_test pidocdocker run -itd --cpus="1.00" -p 127.0.0.1:2200:2222 --name pidoc_00_test pidoc

At this point, we technically already have a cluster. We just don’t have a method to manage them, except by hand.

Performance

While a full core allocation performs at near Physical Raspberry Pi speeds, an instance running at 50% runs rightly at half that speed. This might be manageable under certain circumstances, but it’s not the most desirable. The overall efficiency of the cluster may increase, depending on the task at hand. But for now, we will continue with our original full core allocation of 3 nodes, and then later it tests with 6 nodes.

Single Thread Benchmarks

Testing can be done by using the following simple benchmark tests.

CPU Prime Test
sysbench --test=cpu --cpu-max-prime=9999 run
CPU Integer Test
time $(i=0; while ((i<9999999)); do ((i++)); done)
HDD Read Test
dd bs=16K count=102400 iflag=direct if=test_data of=/dev/null
HDD Write Test
dd bs=16k count=102400 oflag=direct if=/dev/zero of=test_data

Results (Single Thread)

For this guide, we will only focus on the CPU Prime Test using sysbench

Host
General statistics:    total time:                          10.0009s    total number of events:              9417Latency (ms):         min:                                  1.04         avg:                                  1.06         max:                                  1.63         95th percentile:                      1.10         sum:                               9992.36Threads fairness:    events (avg/stddev):           9417.0000/0.00    execution time (avg/stddev):   9.9924/0.00
Virtual Raspberry Pi – Limit: 100%
Test execution summary:    total time:                          397.8781s    total number of events:              10000    total time taken by event execution: 397.4056    per-request statistics:         min:                                 38.61ms         avg:                                 39.74ms         max:                                 57.15ms         approx.  95 percentile:              40.92msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   397.4056/0.00
Virtual Raspberry Pi – Limit: 50%
Test execution summary:    total time:                          823.8272s    total number of events:              10000    total time taken by event execution: 822.9329    per-request statistics:         min:                                 38.68ms         avg:                                 82.29ms         max:                                184.02ms         approx.  95 percentile:              94.65msThreads fairness:    events (avg/stddev):           10000.0000/0.00    execution time (avg/stddev):   822.9329/0.00

Compose the Cluster

Create docker-compose.yml File

We will use Docker Compose for cluster creation. Initially, we will keep this at three nodes to keep it easy to manage. Once we have a proof of concept cluster, we can then scale it out. The most straightforward way to handle this is to map separate ports to localhost for each container. We can specify a range of ports to be used in the docker-compose.yml file, as noted below.

version: '3'services:  node:    image: pidoc    ports:      - "2201-2203:2222"

Bring Up Cluster

To bring up three nodes with docker-compose, use the --scale option.

docker-compose up --scale node=3

Ansible Configuration

Now that we have all the infrastructure in place for a cluster, we need to manage it. We could use Docker to double attach to the QEMU monitor, but ssh is much more robust. Since we are using ssh, we can use Ansible. A few basic operations are provided here: update, upgrade, reboot, and shutdown. These can be expanded as needed to develop a more robust system.

hosts File

Please note of the ports we specified in the docker-compose.yml file earlier, and edit your hosts inventory accordingly.

version: '3'services:  node:    image: pidoc    ports:      - "2201-2203:2222"

For a more comprehensive walkthrough of Ansible, please read How to Install and Configure Ansible on Ubuntu.

update.yml File

---- name: Apt update Pi...  hosts: pidoc-cluster  tasks:    - name: Update apt cache...      become: yes      apt:        update_cache=yes

Usage: ansible-playbook playbooks/update.yml -i hosts

upgrade.yml File

---- name: Upgrade Pi...  hosts: pidoc-cluster  gather_facts: no  tasks:    - name: Update and upgrade apt packages...      become: true      apt:        upgrade: yes        update_cache: yes        cache_valid_time: 86400

Usage: ansible-playbook playbooks/upgrade.yml -i hosts

reboot.yml File

---- name: Reboot Pi...  hosts: pidoc-cluster  gather_facts: no  tasks:    - name: Reboot Pi...      shell: shutdown -r now      async: 0      poll: 0      ignore_errors: true      become: true    - name: Wait for reboot...      local_action: wait_for host={{ ansible_host }}state=started delay=10      become: false

Usage: ansible-playbook playbooks/reboot.yml -i hosts

shutdown.yml File

---- name: Shutdown Pi...  hosts: pidoc-cluster  gather_facts: no  tasks:    - name: 'Shutdown Pi'      shell: shutdown -h now      async: 0      poll: 0      ignore_errors: true      become: true    - name: "Wait for shutdown..."      local_action: wait_for host={{ ansible_host }} state=stopped      become: false

Usage: ansible-playbook playbooks/shutdown.yml -i hosts

Scaling Up

Docker Compose makes scaling Raspberry Pi containers on the same host near trivial. By using Ansible for cluster management, it also becomes incredibly easy to scale horizontally to other hosts by changing the port binding from localhost to an IP address that’s routable. Here is our example with 6 nodes instead of 3.

docker-compose.yml File

version: '3'services:  node:    image: pidoc    ports:      - "2201-2212:2222"    deploy:      resources:        limits:          cpus: "0.5"

Bring Up Cluster

We should stop containers from our previous cluster, and prune all volumes before scaling up our revised cluster. To bring up all 6 nodes with docker-compose, use the --scale option again.

docker-compose up --scale node=5

Future Work

Raspberry Pi emulation is still under development for QEMU. While the configuration for this project is relatively stable, there’s a lot of room for improvement. Attempting a migration to Raspberry Pi 3 emulation would be an ambitious next step. Docker Compose, though designed for single-host builds, is already easy enough to replicate to other hosts manually or through Ansible. But it could just as easily be scaled out with Swarm or k8s, enabling us to build an emulated Raspberry Pi cluster of any size. Additionally, with one or more port redirects, other systems of control can be put into place, including various node endpoints, depending on purpose and application.

Explore Gcore Container as a Service

Related articles

Optimize your workload: a guide to selecting the best virtual machine configuration

Virtual machines (VMs) offer the flexibility, scalability, and cost-efficiency that businesses need to optimize workloads. However, choosing the wrong setup can lead to poor performance, wasted resources, and unnecessary costs.In this guide, we’ll walk you through the essential factors to consider when selecting the best virtual machine configuration for your specific workload needs.﹟1 Understand your workload requirementsThe first step in choosing the right virtual machine configuration is understanding the nature of your workload. Workloads can range from light, everyday tasks to resource-intensive applications. When making your decision, consider the following:Compute-intensive workloads: Applications like video rendering, scientific simulations, and data analysis require a higher number of CPU cores. Opt for VMs with multiple processors or CPUs for smoother performance.Memory-intensive workloads: Databases, big data analytics, and high-performance computing (HPC) jobs often need more RAM. Choose a VM configuration that provides sufficient memory to avoid memory bottlenecks.Storage-intensive workloads: If your workload relies heavily on storage, such as file servers or applications requiring frequent read/write operations, prioritize VM configurations that offer high-speed storage options, such as SSDs or NVMe.I/O-intensive workloads: Applications that require frequent network or disk I/O, such as cloud services and distributed applications, benefit from VMs with high-bandwidth and low-latency network interfaces.﹟2 Consider VM size and scalabilityOnce you understand your workload’s requirements, the next step is to choose the right VM size. VM sizes are typically categorized by the amount of CPU, memory, and storage they offer.Start with a baseline: Select a VM configuration that offers a balanced ratio of CPU, RAM, and storage based on your workload type.Scalability: Choose a VM size that allows you to easily scale up or down as your needs change. Many cloud providers offer auto-scaling capabilities that adjust your VM’s resources based on real-time demand, providing flexibility and cost savings.Overprovisioning vs. underprovisioning: Avoid overprovisioning (allocating excessive resources) unless your workload demands peak capacity at all times, as this can lead to unnecessary costs. Similarly, underprovisioning can affect performance, so finding the right balance is essential.﹟3 Evaluate CPU and memory considerationsThe central processing unit (CPU) and memory (RAM) are the heart of a virtual machine. The configuration of both plays a significant role in performance. Workloads that need high processing power, such as video encoding, machine learning, or simulations, will benefit from VMs with multiple CPU cores. However, be mindful of CPU architecture—look for VMs that offer the latest processors (e.g., Intel Xeon, AMD EPYC) for better performance per core.It’s also important that the VM has enough memory to avoid paging, which occurs when the system uses disk space as virtual memory, significantly slowing down performance. Consider a configuration with more RAM and support for faster memory types like DDR4 for memory-heavy applications.﹟4 Assess storage performance and capacityStorage performance and capacity can significantly impact the performance of your virtual machine, especially for applications requiring large data volumes. Key considerations include:Disk type: For faster read/write operations, opt for solid-state drives (SSDs) over traditional hard disk drives (HDDs). Some cloud providers also offer NVMe storage, which can provide even greater speed for highly demanding workloads.Disk size: Choose the right size based on the amount of data you need to store and process. Over-allocating storage space might seem like a safe bet, but it can also increase costs unnecessarily. You can always resize disks later, so avoid over-allocating them upfront.IOPS and throughput: Some workloads require high input/output operations per second (IOPS). If this is a priority for your workload (e.g., databases), make sure that your VM configuration includes high IOPS storage options.﹟5 Weigh up your network requirementsWhen working with cloud-based VMs, network performance is a critical consideration. High-speed and low-latency networking can make a difference for applications such as online gaming, video conferencing, and real-time analytics.Bandwidth: Check whether the VM configuration offers the necessary bandwidth for your workload. For applications that handle large data transfers, such as cloud backup or file servers, make sure that the network interface provides high throughput.Network latency: Low latency is crucial for applications where real-time performance is key (e.g., trading systems, gaming). Choose VMs with low-latency networking options to minimize delays and improve the user experience.Network isolation and security: Check if your VM configuration provides the necessary network isolation and security features, especially when handling sensitive data or operating in multi-tenant environments.﹟6 Factor in cost considerationsWhile it’s essential that your VM has the right configuration, cost is always an important factor to consider. Cloud providers typically charge based on the resources allocated, so optimizing for cost efficiency can significantly impact your budget.Consider whether a pay-as-you-go or reserved model (which offers discounted rates in exchange for a long-term commitment) fits your usage pattern. The reserved option can provide significant savings if your workload runs continuously. You can also use monitoring tools to track your VM’s performance and resource usage over time. This data will help you make informed decisions about scaling up or down so you’re not paying for unused resources.﹟7 Evaluate security featuresSecurity is a primary concern when selecting a VM configuration, especially for workloads handling sensitive data. Consider the following:Built-in security: Look for VMs that offer integrated security features such as DDoS protection, web application firewall (WAF), and encryption.Compliance: Check that the VM configuration meets industry standards and regulations, such as GDPR, ISO 27001, and PCI DSS.Network security: Evaluate the VM's network isolation capabilities and the availability of cloud firewalls to manage incoming and outgoing traffic.﹟8 Consider geographic locationThe geographic location of your VM can impact latency and compliance. Therefore, it’s a good idea to choose VM locations that are geographically close to your end users to minimize latency and improve performance. In addition, it’s essential to select VM locations that comply with local data sovereignty laws and regulations.﹟9 Assess backup and recovery optionsBackup and recovery are critical for maintaining data integrity and availability. Look for VMs that offer automated backup solutions so that data is regularly saved. You should also evaluate disaster recovery capabilities, including the ability to quickly restore data and applications in case of failure.﹟10 Test and iterateFinally, once you've chosen a VM configuration, testing its performance under real-world conditions is essential. Most cloud providers offer performance monitoring tools that allow you to assess how well your VM is meeting your workload requirements.If you notice any performance bottlenecks, be prepared to adjust the configuration. This could involve increasing CPU cores, adding more memory, or upgrading storage. Regular testing and fine-tuning means that your VM is always optimized.Choosing a virtual machine that suits your requirementsSelecting the best virtual machine configuration is a key step toward optimizing your workloads efficiently, cost-effectively, and without unnecessary performance bottlenecks. By understanding your workload’s needs, considering factors like CPU, memory, storage, and network performance, and continuously monitoring resource usage, you can make informed decisions that lead to better outcomes and savings.Whether you're running a small application or large-scale enterprise software, the right VM configuration can significantly improve performance and cost. Gcore offers a wide range of virtual machine options that can meet your unique requirements. Our virtual machines are designed to meet diverse workload requirements, providing dedicated vCPUs, high-speed storage, and low-latency networking across 30+ global regions. You can scale compute resources on demand, benefit from free egress traffic, and enjoy flexible pricing models by paying only for the resources in use, maximizing the value of your cloud investments.Contact us to discuss your VM needs

How to get the size of a directory in Linux

Understanding how to check directory size in Linux is critical for managing storage space efficiently. Understanding this process is essential whether you’re assessing specific folder space or preventing storage issues.This comprehensive guide covers commands and tools so you can easily calculate and analyze directory sizes in a Linux environment. We will guide you step-by-step through three methods: du, ncdu, and ls -la. They’re all effective and each offers different benefits.What is a Linux directory?A Linux directory is a special type of file that functions as a container for storing files and subdirectories. It plays a key role in organizing the Linux file system by creating a hierarchical structure. This arrangement simplifies file management, making it easier to locate, access, and organize related files. Directories are fundamental components that help ensure smooth system operations by maintaining order and facilitating seamless file access in Linux environments.#1 Get Linux directory size using the du commandUsing the du command, you can easily determine a directory’s size by displaying the disk space used by files and directories. The output can be customized to be presented in human-readable formats like kilobytes (KB), megabytes (MB), or gigabytes (GB).Check the size of a specific directory in LinuxTo get the size of a specific directory, open your terminal and type the following command:du -sh /path/to/directoryIn this command, replace /path/to/directory with the actual path of the directory you want to assess. The -s flag stands for “summary” and will only display the total size of the specified directory. The -h flag makes the output human-readable, showing sizes in a more understandable format.Example: Here, we used the path /home/ubuntu/, where ubuntu is the name of our username directory. We used the du command to retrieve an output of 32K for this directory, indicating a size of 32 KB.Check the size of all directories in LinuxTo get the size of all files and directories within the current directory, use the following command:sudo du -h /path/to/directoryExample: In this instance, we again used the path /home/ubuntu/, with ubuntu representing our username directory. Using the command du -h, we obtained an output listing all files and directories within that particular path.#2 Get Linux directory size using ncduIf you’re looking for a more interactive and feature-rich approach to exploring directory sizes, consider using the ncdu (NCurses Disk Usage) tool. ncdu provides a visual representation of disk usage and allows you to navigate through directories, view size details, and identify large files with ease.For Debian or Ubuntu, use this command:sudo apt-get install ncduOnce installed, run ncdu followed by the path to the directory you want to analyze:ncdu /path/to/directoryThis will launch the ncdu interface, which shows a breakdown of file and subdirectory sizes. Use the arrow keys to navigate and explore various folders, and press q to exit the tool.Example: Here’s a sample output of using the ncdu command to analyze the home directory. Simply enter the ncdu command and press Enter. The displayed output will look something like this:#3 Get Linux directory size using 1s -1aYou can alternatively opt to use the ls command to list the files and directories within a directory. The options -l and -a modify the default behavior of ls as follows:-l (long listing format)Displays the detailed information for each file and directoryShows file permissions, the number of links, owner, group, file size, the timestamp of the last modification, and the file/directory name-a (all files)Instructs ls to include all files, including hidden files and directoriesIncludes hidden files on Linux that typically have names beginning with a . (dot)ls -la lists all files (including hidden ones) in long format, providing detailed information such as permissions, owner, group, size, and last modification time. This command is especially useful when you want to inspect file attributes or see hidden files and directories.Example: When you enter ls -la command and press Enter, you will see an output similar to this:Each line includes:File type and permissions (e.g., drwxr-xr-x):The first character indicates the file type- for a regular filed for a directoryl for a symbolic linkThe next nine characters are permissions in groups of three (rwx):r = readw = writex = executePermissions are shown for three classes of users: owner, group, and others.Number of links (e.g., 2):For regular files, this usually indicates the number of hard linksFor directories, it often reflects subdirectory links (e.g., the . and .. entries)Owner and group (e.g., user group)File size (e.g., 4096 or 1045 bytes)Modification date and time (e.g., Jan 7 09:34)File name (e.g., .bashrc, notes.txt, Documents):Files or directories that begin with a dot (.) are hidden (e.g., .bashrc)ConclusionThat’s it! You can now determine the size of a directory in Linux. Measuring directory sizes is a crucial skill for efficient storage management. Whether you choose the straightforward du command, use the visual advantages of the ncdu tool, or opt for the versatility of ls -la, this expertise enhances your ability to uphold an organized and efficient Linux environment.Looking to deploy Linux in the cloud? With Gcore Edge Cloud, you can choose from a wide range of pre-configured virtual machines suitable for Linux:Affordable shared compute resources starting from €3.2 per monthDeploy across 50+ cloud regions with dedicated servers for low-latency applicationsSecure apps and data with DDoS protection, WAF, and encryption at no additional costGet started today

How to Run Hugging Face Spaces on Gcore Inference at the Edge

Running machine learning models, especially large-scale models like GPT 3 or BERT, requires a lot of computing power and comes with a lot of latency. This makes real-time applications resource-intensive and challenging to deliver. Running ML models at the edge is a lightweight approach offering significant advantages for latency, privacy, and resource optimization.  Gcore Inference at the Edge makes it simple to deploy and manage custom models efficiently, giving you the ability to deploy and scale your favorite Hugging Face models globally in just a few clicks. In this guide, we’ll walk you through how easy it is to harness the power of Gcore’s edge AI infrastructure to deploy a Hugging Face Space model. Whether you’re developing NLP solutions or cutting-edge computer vision applications, deploying at the edge has never been simpler—or more powerful. Step 1: Log In to the Gcore Customer PortalGo to gcore.com and log in to the Gcore Customer Portal. If you don’t yet have an account, go ahead and create one—it’s free. Step 2: Go to Inference at the EdgeIn the Gcore Customer Portal, click Inference at the Edge from the left navigation menu. Then click Deploy custom model. Step 3: Choose a Hugging Face ModelOpen huggingface.com and browse the available models. Select the model you want to deploy. Navigate to the corresponding Hugging Face Space for the model. Click on Files in the Space and locate the Docker option. Copy the Docker image link and startup command from Hugging Face Space. Step 4: Deploy the Model on GcoreReturn to the Gcore Customer Portal deployment page and enter the following details: Model image URL: registry.hf.space/ethux-mistral-pixtral-demo:latest Startup command: python app.py Container port: 7860 Configure the pod as follows: GPU-optimized: 1x L40S vCPUs: 16 RAM: 232GiB For optimal performance, choose any available region for routing placement. Name your deployment and click Deploy.Step 5: Interact with Your ModelOnce the model is up and running, you’ll be provided with an endpoint. You can now interact with the model via this endpoint to test and use your deployed model at the edge.Powerful, Simple AI Deployment with GcoreGcore Inference at the Edge is the future of AI deployment, combining the ease of Hugging Face integration with the robust infrastructure needed for real-time, scalable, and global solutions. By leveraging edge computing, you can optimize model performance and simultaneously futureproof your business in a world that increasingly demands fast, secure, and localized AI applications. Deploying models to the edge allows you to capitalize on real-time insights, improve customer experiences, and outpace your competitors. Whether you’re leading a team of developers or spearheading a new AI initiative, Gcore Inference at the Edge offers the tools you need to innovate at the speed of tomorrow. Explore Gcore Inference at the Edge

10 Common Web Performance Mistakes and How to Overcome Them

Web performance mistakes can carry a high price, resulting in websites that yield low conversion rates, high bounce rates, and poor sales. In this article, we dig into the top 10 mistakes you should avoid to boost your website performance.1. Slow or Unreliable Web HostYour site speed begins with your web host, which provides the server infrastructure and resources for your website. This includes the VMs and other infrastructure where your code and media files reside. Three common host-related problems are as follows:Server location: The further away your server is from your users, the slower the site speed and the poorer the experience for your website visitors. (More on this under point 7.)Shared hosting: Shared hosting solutions share server resources among multiple websites, leading to slow load times and spotty connections during peak times due to heavy usage. Shared VMs can also impact your website’s performance due to increased network traffic and resource contention.VPS hosting: Bandwidth limitations can be a significant issue with VPS hosting. A limited bandwidth package can cause your site speed to decrease during high-traffic periods, resulting in a sluggish user experience.Correct for server and VM hosting issues by choosing a provider with servers located closer to your user base and provisioning sufficient computational resources, like Gcore CDN. Use virtual dedicated servers (VDS/VPS) rather than shared hosting to avoid network traffic from other websites affecting your site’s performance. If you already use a VPS, consider upgrading your hosting plan to increase server resources and improve UX. For enterprises, dedicated servers may be more suitable.2. Inefficient Code, Libraries, and FrameworksPoor-quality code and inefficient frameworks can increase the size of web pages, consume too many resources, and slow down page load times. Code quality is often affected by syntax, semantics, and logic errors. Correct these issues by writing clean and simple code.Errors or inefficiencies introduced by developers can impact site performance, such as excessive API calls or memory overuse. Prevent these issues by using TypeScript, console.log, or built-in browser debuggers during development. For bugs in already shipped code, utilize logging and debugging tools like the GNU debugger or WinDbg to identify and resolve problems.Improving code quality also involves minimizing the use of large libraries and frameworks. While frontend frameworks like React, Vue, and Angular.js are popular for accelerating development, they often include extensive JavaScript and prebuilt components that can bloat your website’s codebase. To optimize for speed, carefully analyze your use case to determine if a framework is necessary. If a static page suffices, avoid using a framework altogether. If a framework is needed, select libraries that allow you to link only the required components.3. Unoptimized Code Files and FontsEven high-quality code needs optimization before shipping. Unoptimized JavaScript, HTML, and CSS files can increase page weight and necessitate multiple HTTP requests, especially if JavaScript files are executed individually.To optimize code, two effective techniques are minification and bundling.Minification removes redundant libraries, code, comments, unnecessary characters (e.g., commas and dots), and formatting to reduce your source code’s size. It also shortens variable and function names, further decreasing file size. Tools for minification include UglifyJS for JavaScript, CSSNano for CSS, and HTMLminifier for HTML.Bundling groups multiple files into one, reducing the number of HTTP requests and speeding up site load times. Popular bundling tools include Rollup, Webpack, and Parcel.File compression using GZIP or Brotli can also reduce the weight of HTTP requests and responses before they reach users’ browsers. Enable your chosen compression technique on your server only after checking that your server provider supports it.4. Unoptimized Images and VideosSome websites are slowed down by large media files. Upload only essential media files to your site. For images, compress or resize them using tools like TinyPNG and Compressor.io. Convert images from JPEG, PNG, and GIF to WebP and AVIF formats to maintain quality while reducing file size. This is especially beneficial in industries like e-commerce and travel, where multiple images boost conversion rates. Use dynamic image optimization services like Gcore Image Stack for efficient processing and delivery. For pages with multiple images, use CSS sprites to group them, reducing the number of HTTP requests and speeding up load times.When adding video files, use lite embeds for external links. Standard embed code, like YouTube’s, is heavy and can slow down your pages. Lite embeds load only thumbnail images initially, and the full video loads when users click the thumbnail, improving page speed.5. No Lazy LoadingLazy loading delays the rendering of heavy content like images and JavaScript files until the user needs it, contrasting with “eager” loading, which loads everything at once and slows down site load times. Even with optimized images and code, lazy loading can further enhance site speed through a process called “timing.”Image timing uses the HTML loading attribute in an image tag or frameworks like Angular or React to load images in response to user actions. The browser only requests images when the user interacts with specific features, triggering the download.JavaScript timing controls when certain code loads. If JavaScript doesn’t need to run until the entire page has rendered, use the defer attribute to delay its execution. If JavaScript can load at any time without affecting functionality, load it asynchronously with the async attribute.6. Heavy or Redundant External Widgets and PluginsWidgets and plugins are placed in designated frontend and backend locations to extend website functionality. Examples include Google review widgets that publish product reviews on your website and Facebook plugins that connect your website to your Facebook Page. As your website evolves, more plugins are typically installed, and sometimes website admins forget to remove those that are no longer required.Over time, heavy and unused plugins can consume substantial resources, slowing down your website unnecessarily. Widgets may also contain heavy HTML, CSS, or JavaScript files that hinder web performance.Remove unnecessary plugins and widgets, particularly those that make cURL calls, HTTP requests, or generate excessive database queries. Avoid plugins that load heavy scripts and styles or come from unreliable sources, as they may contain malicious code and degrade website performance.7. Network IssuesYour server’s physical location significantly impacts site speed for end users. For example, if your server is in the UK and your users are in China, they’ll experience high latency due to the distance and DNS resolution time. The greater the distance between the server and the user, the more network hops are required, increasing latency and slowing down site load times.DNS resolution plays a crucial role in this process. Your authoritative DNS provider resolves your domain name to your IP address. If the provider’s server is too far from the user, DNS resolution will be slow, giving visitors a poor first impression.To optimize content delivery and reduce latency, consider integrating a content delivery network (CDN) with your server-side code. A CDN stores copies of your static assets (e.g., container images, JavaScript, CSS, and HTML files) on geographically distributed servers. This distribution ensures that users can access your content from a server closer to their location, significantly improving site speed and performance.8. No CachingWithout caching, your website has to fetch data from the origin server every time a user requests. This increases the load time because the origin server is another physical hop that data has to travel.Caching helps solve this problem by serving pre-saved copies of your website. Copies of your web files are stored on distributed CDN servers, meaning they’re available physically closer to website viewers, resulting in quicker load times.An additional type of caching, DNS caching, temporarily stores DNS records in DNS resolvers. This allows for faster domain name resolution and accelerates the initial connection to a website.9. Excessive RedirectsWebsite redirects send users from one URL to another, often resulting in increased HTTP requests to servers. These additional requests can potentially crash servers or cause resource consumption issues. To prevent this, use tools like Screaming Frog to scan your website for redirects and reduce them to only those that are absolutely necessary. Additionally, limit each redirect to making no more than one request for a .css file and one for a .js file.10. Lack of Mobile OptimizationForgetting to optimize for mobile can harm your website’s performance. Mobile-first websites optimize for speed and UX. Better UX leads to happier customers and increased sales.Optimizing for mobile starts with understanding the CPU, bandwidth, and memory limitations of mobile devices compared to desktops. Sites with excessively heavy files will load slowly on mobiles. Writing mobile-first code, using mobile devices or emulators for building and testing, and enhancing UX for various mobile device types—such as those with larger screens or higher capacity—can go a long way to optimizing for mobile.How Can Gcore Help Prevent These Web Performance Mistakes?If you’re unsure where to start in correcting or preventing web performance mistakes, don’t worry—you don’t have to do it alone. Gcore offers a comprehensive suite of solutions designed to enhance your web performance and deliver the best user experience for your visitors:Powerful VMs: Fast web hosting with a wide range of virtual machines.Managed DNS: Hosting your DNS zones and ensuring quick DNS resolution with our fast Managed DNS.CDN: Accelerate both static and dynamic components of your website for global audiences.With robust infrastructure from Gcore, you can ensure optimal performance and a seamless experience for all your web visitors. Keep your website infrastructure in one place for a simplified website management experience.Need help getting started? Contact us for a personalized consultation and discover how Gcore can supercharge your website performance.Get in touch to boost your website

How to Choose Between Bare Metal GPUs and Virtual GPUs for AI Workloads

Choosing the right GPU type for your AI project can make a huge difference in cost and business outcomes. The first consideration is often whether you need a bare metal or virtual GPU. With a bare metal GPU, you get a physical server with an entire GPU chip (or chips) installed that is completely dedicated to the workloads you run on the server, whereas a virtual GPU means you share GPU resources with other virtual machines.Read on to discover the key differences between bare metal GPUs and virtual GPUs, including performance and scalability, to help you make an informed decision.The Difference Between Bare Metal and Virtual GPUsThe main difference between bare metal GPUs and virtual GPUs is how they use physical GPU resources. With a bare metal GPU, you get a physical server with an entire GPU chip (or chips) installed that is completely dedicated to the workloads you run on the server. There is no hypervisor layer between the operating system (OS) and the hardware, so applications use the GPU resources directly.With a virtual GPU, you get a virtual machine (VM) and uses one of two types of GPU virtualization, depending on your or a cloud provider’s capabilities:An entire, dedicated GPU used by a VM, also known as a passthrough GPUA shared GPU used by multiple VMs, also known as a vGPUAlthough a passthrough GPU VM gets the entire GPU, applications access it through the layers of a guest OS and hypervisor. Also, unlike a bare metal GPU instance, other critical VM resources that applications use, such as RAM, storage, and networking, are also virtualized.The difference between running applications with bare metal and virtual GPUsThese architectural features affect the following key aspects:Performance and latency: Applications running on a VM with a virtual GPU, especially vGPU, will have lower processing power and higher latency for the same GPU characteristics than those running on bare metal with a physical GPU.Cost: As a result of the above, bare metal GPUs are more expensive than virtual GPUs.Scalability: Virtual GPUs are easier to scale than bare metal GPUs because scaling the latter requires a new physical server. In contrast, a new GPU instance can be provisioned in the cloud in minutes or even seconds.Control over GPU hardware: This can be critical for certain configurations and optimizations. For example, when training massive deep learning models with a billion parameters, total control means the ability to optimize performance optimization—and that can have a big impact on training efficiency for massive datasets.Resource utilization: GPU virtualization can lead to underutilization if the tasks being performed don’t need the full power of the GPU, resulting in wasted resources.Below is a table summarizing the benefits and drawbacks of each approach: Bare metal GPUVirtual GPUPassthrough GPUvGPUBenefitsDedicated GPU resourcesHigh performance for demanding AI workloadsLower costSimple scalabilitySuitable for occasional or variable workloadsLowest costSimple scalabilitySuitable for occasional or variable workloadsDrawbacksHigh cost compared to virtual GPUsLess flexible and scalable than virtual GPUsLow performanceNot suitable for demanding AI workloadsLowest performanceNot suitable for demanding AI workloadsShould You Use Bare Metal or Virtual GPUs?Bare metal GPUs and virtual GPUs are typically used for different types of workloads. Your choice will depend on what AI tasks you’re looking to perform.Bare metal GPUs are better suited for compute-intensive AI workloads that require maximum performance and speed, such as training large language models. They are also a good choice for workloads that must run 24/7 without interruption, such as some production AI inference services. Finally, bare metal GPUs are preferred for real-time AI tasks, such as robotic surgery or high-frequency trading analytics.Virtual GPUs are a more suitable choice for the early stages of AI/ML and iteration on AI models, where flexibility and cost-effectiveness are more important than top performance. Workloads with variable or unpredictable resource requirements can also run on this type of GPU, such as training and fine-tuning small models or AI inference tasks that are not sensitive to latency and performance. Virtual GPUs are also great for occasional, short-term, and collaborative AI/ML projects that don’t require dedicated hardware—for example, an academic collaboration that includes multiple institutions.To choose the right type of GPU, consider these three factors:Performance requirements. Is the raw GPU speed critical for your AI workloads? If so, bare metal GPUs are a superior choice.Scalability and flexibility. Do you need GPUs that can easily scale up and down to handle dynamic workloads? If yes, opt for virtual GPUs.Budget. Depending on the cloud provider, bare metal GPU servers can be more expensive than virtual GPU instances. Virtual GPUs typically offer more flexible pricing, which may be appropriate for occasional or variable workloads.Your final choice between bare metal GPUs and virtual GPUs depends on the specific requirements of the AI/ML project, including performance needs, scalability requirements, workload types, and budget constraints. Evaluating these factors can help determine the most appropriate GPU option.Choose Gcore for Best-in-Class AI GPUsGcore offers bare metal servers with NVIDIA H100, A100, and L40S GPUs. Using the 3.2 Tbps InfiniBand interface, you can combine H100 or A100 servers into scalable GPU clusters for training and tuning massive ML models or for high-performance computing (HPC).If you are looking for a scalable and low-latency solution for global AI inference, explore Gcore Inference at the Edge. It especially benefits latency-sensitive, real-time applications, such as generative AI and object recognition.Discover Gcore bare metal GPUs

How to Configure Grafana for Visualizing Kubernetes (K8s) Cluster Monitoring

Kubernetes monitoring allows you to observe your workloads and cluster resources, spot issues and failures, and efficiently manage pods and other resources. Cluster admins should prioritize tracking the performance and stability of clusters in these environments. One popular tool that can help you visualize Kubernetes monitoring is Grafana. This monitoring solution lets you display K8s metrics through interactive dashboards and real-time alerts. It seamlessly integrates with Prometheus and other data sources, providing valuable insights.Gcore Managed Kubernetes simplifies the Grafana setup process by providing a managed service that includes tools like Grafana. In this article, we’ll explain how to set up and configure Grafana to monitor Kubernetes, its key metrics, and dashboards.Setting Up Grafana for Effective Kubernetes MonitoringTo begin monitoring Kubernetes with Grafana, first, check that you have all the requirements in place: a functioning Kubernetes cluster, the Helm package manager installed, and kubectl set up to communicate with your cluster.Install Grafana in a Kubernetes Cluster. Start by adding the Grafana Helm repository.helm repo add grafana https://grafana.github.io/helm-chartshelm repo updateNext, install Grafana using Helm. This command deploys Grafana into your Kubernetes cluster:helm install grafana grafana/grafanaNow it’s time to configure Grafana for the Kubernetes environment. After installation, retrieve the admin password by using the command below:kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echoThen access the Grafana UI by port-forwarding:kubectl port-forward svc/grafana 3000:80Open your web browser and navigate to http://localhost:3000. Log in using the default username admin and the password you retrieved. Once logged in, you can configure Grafana to monitor your Kubernetes environment by adding data sources such as Prometheus and creating custom dashboards.You’ve now successfully set up Grafana for Kubernetes monitoring!Key Metrics for Kubernetes MonitoringUnderstanding metrics for Kubernetes monitoring allows you to visualize your cluster’s reliability. Key metrics are the following:Node resources. Track CPU and memory usage, disk utilization, and network bandwidth to understand resource consumption and identify bottlenecks.Cluster metrics. Monitor the number of nodes to understand resource billing and overall cluster usage, and track running pods to determine node capacity and identify failures.Pod metrics. Measure how pods are managed and deployed, including instances and deployment status, and monitor container metrics like CPU, memory, and network usage.State metrics. Keep an eye on persistent volumes, disk pressure, crash loops, and job success rates to ensure proper resource management and application stability.Container metrics. Track container CPU and memory usage relative to pod limits, and monitor network data to detect bandwidth issues.Application metrics. Measure application availability, performance, and business-specific metrics to maintain optimal user experience and operational health.Setting Up Grafana DashboardsYou can opt to design and tailor Grafana dashboards to monitor your Kubernetes cluster. This will help you better understand your systems’ performance and overall well-being at a glance.Log into Grafana. Open your web browser, go to http://localhost:3000/, and log in with the default credentials (admin for both username and password), then change your password if/when prompted.Grafana—Log In to Start MonitoringAdd data source. Navigate to Configuration and select Data Sources. Click on Add Data Source and choose the appropriate data source, such as Prometheus.Create a dashboard. Go to Create > Dashboard, click Add New Panel, choose the panel type (e.g., Time series chart, Gauge, Table), and configure it with a PromQL query and visualization settings.Adding a New Panel in Grafana DashboardOrganize and save the dashboard. Arrange panels by clicking Add Panel > Add Row and dragging panels into the desired rows. To save the dashboard, click the save icon, name it, and confirm the save.Gcore Managed Kubernetes for Kubernetes MonitoringWhether you’re getting started with monitoring Kubernetes or you’re a seasoned pro, Gcore Managed Kubernetes offers significant advantages for businesses seeking efficient and reliable Kubernetes cluster monitoring and container management:Ease of integrating Grafana: The service seamlessly integrates with Grafana, enabling effortless visualization and monitoring of performance metrics via dashboards.Automated control: Gcore Managed Kubernetes simplifies the setup and monitoring process by using automation. This service conducts health checks on your nodes, automatically updating and restarting them when needed to keep performance at its best.Enhanced security and reliability: Gcore Managed Kubernetes guarantees the management of nodes by integrating features like automatic scaling and self-repairing systems to maintain optimal performance.Discover Gcore Managed Kubernetes, including automated scaling, one-click provisioning, and Grafana integration.

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.