Richard Bucker

Garbage Collection

Posted at — Feb 6, 2020

Time to take out the trash…

I responded to a recent “go to rust” article. One of the pluses for rust was that it did not have a garbage collector. Well if you have spent any time with C, C++, go and Java then you know about memory allocation strategies and the like. If you’ve done some deep dives in Java then you might even be familiar with the different GC strategies. And if you’ve spent any tie on OS internals then you know more than the top 5 or 10 percent of all programmers.

But now there is a new type of carbage collection and currently there is no strategy for cleanup. The first is simple containers; ie docker or containerd. Unfortunately the vocabulary is tough because in the strict definition it means one thing and in the general; many things.

Let me try this… docker and containerd are orchestration tools that allow you to create a chroot’d filesystem with other partitioned resources such that the actual service or task is partitioned as a guest from the host. It is something similar to VMware. However there is at least one major difference. The filesystem.

When a container is running the filesystem. It is actually a file that is mounted inside the partitioned service. When the service makes changes to the filesystem then layers are created with just the changes. (see btfs or Overlayfs). The longer the process runs and the more changes that are made more layers or deltas are created. At some point these layers will slow the overall performance of the container as it tries to manage the deltas. Therefore, at some point the layers need to be flattened (GC #1).

In most cases the operations people are deploying some meaningful application with a fire and forget attitude. Chances are, however, that the app is a 3rd party service and every so often you need to upgrade. In this case you are probably storing the config info and application data in some sort of partition on the host or maybe a NAS of some kind. So updating the app is as simple as stopping the current container and startng a new one. (GC #2)

Lastly you might be a programmer or devops professional and you are using containers itself for the build process in some CI/CD world. And you might be automating the container build and deploy with a registry. Most registries take a delete nothing approach (GC #3). Having previous revisions makes perfect sense as it means you can always rollback to a previous version.

GC #1 - Ask yourself when is a good time to flatten? Having the changes stored as layers means you can rollback to a previous instance. This could be that a configuration changes in the container was bad … assuming that the containers are viewed as mutable. This may or may not be a good idea.

GC #2 - When will you rollback? There is a period of time after you roll in a new release that you are comitted to the release. In one crazy universe you might flip flop thus causing all sorts of bad. But there is so much storage wasted keeping all of history forever.

GC #3 - Along with #2 programmers create so many breadcrumbs… when should we let the birds eat? There are some buildsystems that insist on building every code change and that every code change is documented and tested. This type of allocation grows in all directions like spray insulating foam; just as wet and just as sticky. Not mentioned but equally important are all those changes in the version control.

Garbage collection in golang is just another GC in a long line of GC issues.

PS: trouble tickets and feature requests.