How to run 2500 webservers on a Raspberry Pi

21 octobre 2015

How to run 2500 webservers on a Raspberry Pi

If you didn't saw the announcement, I'm part of the winner team for DockerCon RPi Challenge. This blog post is about giving some details on our setup to get such a high number of webservers on a small device.

Some might thing you have to make your Docker image as small as possible, but this isn't actually the case. The image will result into space on disk for /var/lib/docker but not memory consumption. Also, a big process loaded into memory would only consume memory once, then kernel will share code page between equivalent processes, so hundred of them would only consume memory once. My first idea was to build a webserver to include the html and image content into source code. But then Yoann explained me sendfile can be used to fully delegate this to kernel and make the process even simpler. For Java developers, consider sendfile as some kernel-level IOUtils.copy(File, OutputStream).

We used hypriot's nano http image. This one is a webserver developped in assembly code to just serve files from disk using kernel sendfile call. Such a program as a minimal memory footprint and a 1-depth stack. The memory allocation for kernel to handle such a process can then be as compact as possible.

Next step was to run some tests and tweak Docker to run as much webservers as possible. We applied various strategies, without any methodology but just apply various recipes we had in mind and check the result (it takes hours to run thousand servers...)

Free memory

We tweaked the Raspberry and OS to reduce memory usage. Some low level tweaks allow to disable useless features at boot, some system level one are used to disable linux feature we don't need for this challenge.

Swap !

Yoann tried to explain me what zRAM is and I probably didn't got it right, but the general idea is that classic swap on disk is incredibly slow, and is only your last chance to free memory. A better, modern approach is to compress memory, which CPU can do very efficiently, a lot faster than accessing disk (especially on a RPi as disk is a SD card).

So our setup do use 5 zram 4 of them for swap (on per CPU, to allow concurrent access) + one for /var/lib/docker filesystem

What? Yes, we use a RamDisk for /var/lib/docker, even we did all those efforts to reduce memory usage... Main issue for this challenge is that running a test and start thousands containers takes hours. Having /var/lib/docker on the SD card made it terribly slow. If we had to get further on the challenge we would have used an external USB SSD disk.

Tweak docker command

Web servers are started by docker from a script. We selected docker options to reduce resource consumed by each web server. Especially, running with a a dedicated IP stack per container involve a huge resource usage, so a key hack was to run with --net=host. We also disabled log driver so docker don't have to collect logs and as such uses less resources. This seem to not work as expected (read later)

Tweak docker process

Linux also allows to tweak the way a process is managed in kernel, we used it to ensure docker run with minimal required resources and use swap

Tweak docker daemon config

Docker is ran by systemd on hypriot OS image, so we had to tweak it a few to unlock limitations. My naive understanding of Linux was that being ran as root, docker deamon could do anything. This isn't the case and it actually can't run more than few dozen processes with default configuration.

Docker daemon has many options we used to reduce it's memory usage. Generally speaking we tried to disable everything that is not required to run a webserver with docker engine. logs, network, proxies. We expected this to prevent Docker daemon to run threads to collect logs or proxy signals to the contained processes.

2499 Limit

Then we hit the 2499 limit, with this in daemon.log :

docker[307]: runtime: program exceeds 10000-thread limit

Go language did introduce a thread limit to prevent misuse of threading. 10000 was considered enough for any reasonable usage. I indeed would not consider running so much thread a correct design, but here we hit such a limit because docker daemon do run 4 threads per container. It's not yet clear to me what those threads are used for.

Using Go thread dump (SIGQUIT) I noticed some of them are related to logging, even we ran with --log-driver=none as an attempt to get further. I guess docker design here is to always collect then dispatch to "none" log driver which is NoOp, not to fully disable logging feature.

So, 2499 is our best official score considering the RpiDocker Challenge rules.

We also wanted to know the upper limit. We made experiments running the plain httpd webserver without docker, and were able to run 27000 of them on the Raspberry. Docker daemon actually grows in memory usage and at some point as some bad impact on the system so you can't run more process. Please note this isn't relevant for arguments against docker on production system, until your business is to run thousands containers on a extra small server.

So, we hacked docker source code to force the MaxThread limit to 12000, built ARM docker executable and ran the script. We were able to run ~2740 web servers before we reach our first, real OOM

[21112.371259] INFO: rcu_preempt detected stalls on CPUs/tasks:

[21112.377124] Tasks blocked on level-0 rcu_node (CPUs 0-3):

What's next ?

We'd like to better understand Docker threading model, and discuss this issue with docker core team. Using Non-Blocking IO might be an option to rely on a minimal set of threads. I have no idea yet how Golang do handle NIO, I just know it's a pain in Java so I wouldn't do it until I have good reasons to...

26 commentaires:

David Gageot a dit…: Nice! And Crazy. But nice!; 21 octobre 2015 à 11:32
Emmanuel Lécharny a dit…: 10 000 threads limitation is a bit stupid, especially when it's a language limitation. When you are doing blocking IO, with a connected protocol (like LDAP), you might want to be able to deal with more than 10 000 incoming connections, thus you will need as many threads as you have connections.

OTOH, you still can use non-blocking IO, but you will pay a huge performance penalty for doing so (around 30%).; 21 octobre 2015 à 11:59
Anonyme a dit…: Emmanuel, The 10,000 thread limit in Go is 10,000 OS-threads, not 10,000 goroutines (of which you can comfortably have hundreds of thousands if you wanted).

In your example (or any blocking IO example really) Go would handle all those incoming connections in a few threads only, and multiplex goroutines onto them as and when connections were unblocked and ready to do something. Goroutines are cheap, while threads are quite expensive, so handling a blocking IO on a per-thread basis would probably be quite slow as you scaled up threads beyond the number of CPUs available on the system.; 21 octobre 2015 à 16:13
Nicolas De Loof a dit…: @Edd is there some doc to explain how Goroutines are implemented ?; 21 octobre 2015 à 16:18
Darren Gordon a dit…: @Nicolas http://dave.cheney.net/2015/08/08/performance-without-the-event-loop; 22 octobre 2015 à 19:56
DiegoAlice a dit…: You would need to use lightweight web servers like Nginx or Caddy to maximize resource management in order to run 2,500 web servers on a Raspberry Pi. Use Docker containers to scale and provide isolation. To prevent overloading the system, it's critical to restrict the amount of resources used by each server and keep a careful eye on performance.
How Can I Get a Divorce in New York; 9 octobre 2024 à 09:30
Linda McGrath a dit…: Running 2,500 web servers on a Raspberry Pi seems fascinating, but also challenging because of limited resources. It involves using lightweight servers and optimized setups, but balancing performance and power on a small device is complex. I’m exploring current educational research topics, and these kinds of experiments inspire me about what’s achievable with technology.; 8 novembre 2024 à 07:11
williamanderson a dit…: Get 70% Discount on Your First Exam:>>>>>>>>> CCSP Dumps; 22 août 2025 à 09:31
Digeoteddy a dit…: Use Docker containers to scale and provide isolation. To prevent overloading the system, it's critical to restrict the amount of resources used by each server and keep a careful eye on performance.
Daman Game Link; 26 août 2025 à 16:35
eskill600 a dit…: Delivering fresh perspectives on Ireland, the irish insider covers everything from travel guides and cultural traditions to business updates and current affairs. Its goal is to inform, inspire, and connect readers with the stories that matter most. If you want to experience Ireland beyond the headlines, The Irish Insider is the perfect guide.; 29 août 2025 à 16:19
anushiya a dit…: Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. Look at my webpage. cheap uncontested divorce lawyer near me; 17 septembre 2025 à 16:13
thomas a dit…: Running 2,500 lightweight webservers on a Raspberry Pi involves using ultra-efficient frameworks, container density optimization, and minimal static-serving processes. By tuning the kernel, reducing overhead, leveraging asynchronous I/O, and distributing ports, the Pi can handle thousands of micro-instances for testing, benchmarking, or large-scale simulation with surprisingly stable performance. Law Offices of SRIS, P.C.; 4 décembre 2025 à 10:25
Williamdonald a dit…: By tuning the kernel, reducing overhead, leveraging asynchronous I/O, and distributing ports, the Pi can handle thousands of micro-instances for testing, benchmarking, or large-scale simulation with surprisingly stable performance.
PG Solt Casino Review; 4 décembre 2025 à 14:28
Jesonlee a dit…: Ce commentaire a été supprimé par l'auteur.; 29 décembre 2025 à 19:09
Jesonlee a dit…: Miser uniquement sur un Desherbant Efficace n’est pas une solution durable à long terme sans alternatives écologiques. Click here; 29 décembre 2025 à 19:15
Sofia a dit…: Roundup ultraplus is known for its strong, reliable performance, making it a popular reference point in professional weed-control discussions.; 12 janvier 2026 à 07:30
Jesonlee a dit…: Le Desherbant Glyphosate est connu pour son efficacité, tout en restant au centre de nombreux débats.; 17 février 2026 à 20:25
yuzhang a dit…: Great insights on optimizing Docker setups for high performance on small devices! The explanation about memory consumption and process sharing is particularly enlightening. For more thrilling content, check out Soflo Wheelie Life – where innovation meets excitement!; 20 février 2026 à 14:34
Jesonlee a dit…: Le Desherbant Radical offre une action rapide contre les plantes indésirables les plus résistantes.; 28 février 2026 à 19:22
Jesonlee a dit…: Pour découvrir davantage d’informations sur ces produits et leurs usages, Check this out.; 28 février 2026 à 19:40
Jesonlee a dit…: Lecoin Botanique met en avant des solutions pratiques pour l’entretien du jardin et la gestion des mauvaises herbes.; 28 février 2026 à 19:42
Jesonlee a dit…: O CapCut mod apk é bastante procurado por liberar recursos premium e efeitos exclusivos sem limitações. Ele permite criar vídeos mais profissionais com facilidade, mas é importante ter cuidado ao instalar versões modificadas.; 28 février 2026 à 20:08
Williams a dit…: Si vous recherchez un produit désherbant efficace pour votre maison en France, Roundup Ultra Plus est un incontournable car il élimine complètement toutes les mauvaises herbes indésirables de votre jardin.; 8 mars 2026 à 23:21
Williams a dit…: Ce commentaire a été supprimé par l'auteur.; 8 mars 2026 à 23:23
Williams a dit…: Le désherbant radical est indispensable pour éliminer toutes les mauvaises herbes de votre jardin. Achetez-le maintenant !; 8 mars 2026 à 23:24
david a dit…: Great insights on optimizing Docker setups for high performance on small devices! The explanation about memory consumption and process sharing is particularly enlightening. For more thrilling content, check out Soflo Wheelie Life – where innovation meets excitement!
ipl cricket id provider; 2 juillet 2026 à 14:50