Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
There are only 12 binaries in Talos Linux (siderolabs.com)
156 points by JustinGarrison on March 4, 2024 | hide | past | favorite | 70 comments


Super cool. I always enjoy reading about systems that challenge, well, "ossified" assumptions. An OS not providing a shell, for example? Madness! ... or is it genius, if the OS has a specific purpose...? It's thought-provoking, if nothing else.

I'm a bit skeptical of parts. For instance, the "init" binary being less than 400 lines of golang - wow! And sure, main.go [1] is less than 400 lines and very readable. Then you squint at the list of imported packages, or look to the left at the directory list and realize main.go isn't nearly the entire init binary.

That `talosctl list` invocation [2] didn't escape my notice either. Sure, the base OS may have only a handful of binaries - how many of those traditional utilities have been stuffed into the API server? Not that I disagree with the approach! I think every company eventually replaces direct shell access with a daemon like this. It's just that "binary footprint" can get a bit funny if you have a really sophisticated API server sitting somewhere.

[1]: https://github.com/siderolabs/talos/blob/main/internal/app/m...

[2]: https://www.talos.dev/v1.6/reference/cli/#talosctl-list


Exactly this. I was thinking of making a similar comment but you made it far better than I could.

Number of binaries is kind of a meaningless metric, especially for a system that historically follows the UNIX philosphy of each program doing one thing.

Sure, a shell is complicated and a potential risk, and perhaps it's a good idea to exclude from the base system in this context.

But I'd rather have ls, tr and wc on my system than some bespoke, all-encompassing API service that has been far less battle tested providing similar functionality.

And like you rightly pointed out, these new binaries all contain their own list of dependencies which are pulled in at build time and need to be taken into scope as well.

That's not to say Talos or its approach doesn't hold merit, but I think it's a little disengenious to simply point at the number of binaries.


I agree number of binaries is an arbitrary metric but also an indicator that things work differently with Talos. You have to use the declarative API for management which some people could see as a bad thing.

I’d also like to point out that the system API is designed to be extendable and adaptable to different operating systems. We’d love for more vendors to create adapters/shims to get the benefits of API managed Linux

https://github.com/cosi-project/community


SystemD bashing aside :p Talos is pretty awesome for setting up clusters. At home I just run talos with matchbox for PXE bootstrapping it works like a charm. Been really easy to maintain too. I normally just update matchbox and then reset a machine at a time with talos ctl for a clean install. It's something very reassuring with completely reset your machines so you know you could reinstall or replace them easily.

Granted just used in a home setting running smaller workloads for backups, private projects, git etc.


The /sbin/init binary is hard linked to /sbin/dashboard, /sbin/poweroff, /sbin/shutdown, and /sbin/wrapperd. While this technically is 5 files, it’s a single file hard linked 4 times to provide convenience commands.

Err, that's definitely 1 file with 5 directory entries, not 5 files.


Wonder why they would use hard links instead of symlinks.

Edit: interesting, seems like there's a mild performance benefit.

https://unix.stackexchange.com/a/20716


Reverse question: why use symlinks when you can get away with hard links?


Hard links only work in the same filesystem. If your system only has one volume with one filesystem (e.g. a PC) that's fine, but not a very portable option for servers.


Every one of those files is in /sbin though


Main thing I've seen with hard links is that deletions delete the source file which about 90% of the time isn't what an end user wants


Only if its the last hard link. If nothing else, its a wee bit of insurance from deletion since no single link removal should remove the file.

Anecdote, eons ago, we had a problem where the vendor needed to log in to the machine with the intent that they were going to upload some utilities, fix a problem, and then delete them.

Before I let them in, I set up a script that constantly scanned the directory tree they were in, and hard linked everything so I could look at what they were using later.


Clever solution!


That's a problem with directory hardlinks: a recursive deletion will wipe the directory (in the same way that truncate will destroy a file regardless of how many hardlinks it has). But we don't usually allow directory hardlinks these days.


that isn't generally how hardlinks work: you need to delete all instances of it to delete the file, as opposed to having one 'real' file like with symlinks.


It's more of a problem that changes to one linked file change the only copy. But this depends on your filesystem. Not all work this way.


There's a lot of benefits actually, as symbolic links often need to be handled as a completely different type of file with different semantics which often leads to bugs. Hard links are always better when you know you are dealing with a single static host & filesystem. Symbolic links in such a case are only better for indicating quickly to the user which files are linked to which others.


Not sure it really matters that much in this case. But why not? Hard links are a quite clean and simple concept. One “object”, multiple names.


Big fan of Talos, have used it in some homelab + cloud clusters over the years, currently powers all my self-hosting. The `talosctl` command is great, and any time you need to do node-level debugging, there's always something like node-shell [1].

[1] https://github.com/kvaps/kubectl-node-shell


I wonder how a really slimmed down distro like Alpine would compare here, particularly in terms of image size.

It offers most of the standard Linux utilities we know and love, but most of them are actually just symlinks to Busybox, which is ~900K on my (ARM64) system. That's less than a hello world in Go, for a program that can replace most common Linux utilities in daily usage.


We operate talos and alpine based nodes, many thousands of them. The build chain for alpine is many orders of magnitude more complex than the build pipe for our talos image modifications. Alpine is really not made for doing a lot of “host” tasks and needs much coercion to get it to be capable of running something like k3s, and much more complex to get kubeadm clusters running on it. In the end the complexity is required for flexibility, alpine nodes can be modified on a whim, talos is R/O and ephemeral, but more secure.


Busybox is only 1 binary symlinked hundreds of times


Yes, and?

(They and we all know that. They said as much theirself right in the comment.)


The benefit of Talos isn't low binary count but that _can_ reduce the amount of maintenance required.

The benefit is declarative API driven management. You spend less time automating a system to a desired state in a similar vein Kubernetes provides a declarative API.


What does any of this have to do with my question? You just restated a fact, that busybox is a binary with a lot of symlinks, but why? So what? Yes, also the sky is blue and water is wet. It's like you didn't read the comment you responded to. It wasn't even about the benefit of Talos.


It's disingenuous to say that /sbin/init (machined/main.go) is less than 400 lines of code. Sure, that file is. What about all of the in-tree modules that are being imported? A super lazy summing of Go lines in the master branch of the repo:

$ find . -name *.go | xargs wc -l | tail -1

  354085 total
Heck, there are almost 100k lines under internal/app!

$ find internal/app -name *.go | xargs wc -l | tail -1

  96885 total
I'm curious what argument you are making here with regards to the number of lines in a single file.


Kind of like TextAdept claiming that text editor is "just 2K lines of C/C++ and 4K lines of Lua", when it's in fact wrapping a well known third party text editor widget.

At least here it makes more sense, since it's not like "our init system is just 200 lines" when those lines are wrapping a third party init system library.

It's more like "our init system logic is just 200 lines, not including third party dependencies". That's legit, provided that those deps are stuff like parsers, some library for dealing with strings or running processes, and things like that.


I know there are a lot more lines and I didn’t count any of the imports from systemd either. 300 loc (machined) vs 3000 loc (systemd) was the closest comparison I could think of without crawling all imports and deps.

Would be happy to update with a different comparison you think is more fair.


How about we just don't compare lines of code at all, as if it's a useful metric of anything?


"Measuring programming progress by lines of code is like measuring aircraft building progress by weight."

-- attributed to Bill Gates


For measuring bloat however they're a good proxy.


Is there a metrics that can convey the complexity of a general purpose init system like systemd vs a single purpose init like Talos' machined?

That is what I was trying to convey and couldn't find a reasonable metric.


>As opposed to systemd which is over 3000 lines of C code I’ll never comprehend.

Well, technically true, but systemd is a whole lot more than 3000 lines...

I can see another binary in the demo video called apid, does that one not count?

Any comparison with Bottlerocket OS?


The systemd hate is getting long in the tooth now. It's not like it doesn't do anything with its line count, or that the code is obfuscated.


I’m actually a big fan of systemd. It’s an awesome, general purpose, and flexible init system.

I don’t think the complexity it brings is required for Kubernetes.


There's work in a new kublet replacement that moves things that would normally go into daemonset into systemd (or something like systemd).

There's also a neat feature of podman that runs pods as systemd units, which is a nice intermediate step between a more traditional pet server and a full kubernetes cluster.



Also this one: https://github.com/virtual-kubelet/systemk

Although what I was thinking of was an article written somewhere and posted here in HN, and more a broad rethink on Kubernetes.


it could have been said without the x lines of code comment. "lines of code" is so often used as a "disparagement" about software rather than a metric for understandability.

Something along the lines of "...not needing a general purpose init system that integrates with logging, network and mounting, when all we are running is Kubernetes."


The second I learned how to write systems unit files was the second I evicted initv/rc.d scripts from my mind. Well, okay, from my search terms at least ;-)

It can even kind of replace cron with timers, and no more mucking with grub. Also, true parallel init tasks. Love it.


I'm not reading that as hate, I read that as criticising systemd in the context of a stripped down system designed to do one thing.

I have systemd on the laptop I am typing on right now. Do I want it on some tiny embedded linux device? probably not.


Thanks for clarifying because that was the vibe I was going for (not hate).

I'm using projectbluefin.io for all my laptops/desktops and love it. Wouldn't want the same on single-purpose, production servers.


So there is a quantum of criticism/observations about systemd that can be made but after that, no more is acceptable?


Many of the binaries you see in the demo video are showing processes running from inside containers. There will be a lot more processes once you start pulling containers and starting containerized services.

Notably the kubelet is also missing from the list because it's not built into the OS but pulled as needed from the correct version of Kubernetes requested.

Bottlerocket runs systemd and also runs 2 versions of containerd. One for the system and one for workloads. This (in theory) hardens the OS more, but in practice makes things extremely annoying to manage because you have to get a shell on the host to access the API.

disclaimer, I used to work at AWS on EKS and closely with the Bottlerocket team.


In retrospect, it would've saved a lot of trouble and misunderstandings if systemd had called the init daemon "systemd-init" to make it clear that not literally everything that is under the umbrella is part of the init daemon.


Eh... Most of the other components have a hard dependency on the init part; I'm not convinced that they're all that separate.


It's not the other way around though, which is a very important distinction. You don't need to use systemd-networkd or systemd-resolved or any manner of other things just to use the init daemon. The init daemon itself is extremely useful, and there are many machines that use the init daemon without most of the other services under the umbrella.

It makes sense that a lot of the other services in systemd depend on the init daemon, it provides a lot of baseline services and features that are used for the rest of it. As a matter of fact, I don't even know what other init daemon I would choose if I wanted similar features around system daemon management, as there's a lot in the surface area that is genuinely useful. Honestly, there's a lot of useful stuff for handling secrets, handling UNIX domain sockets, temporary files, sandboxing apps, setting resource limits, managing unit lifecycles, etc. There are a few features I find somewhat more dubious (personally I'm not sold on DynamicUsers) but by and large I actually like a lot of the surface area systemd's init daemon provides and if I were to use something else I'd want something in a similar ballpark.


20 years ago, I used to make custom Linux distros for fun. Floppy distros, CDROM distros, RAM-resident distros, network-boot distros. In a few of them, I custom-made my own binary that was both the init system, and a few applications, stripped it down, and shipped just that as the distro (basically just a few files and my static binary).

A lot of people downloaded them, and it was great fun - to start. Problem is when you want to do more things. You have to start finding workarounds to bolt-on additional tools, or maybe you just throw one or two extra tools in there by default. Over time you find more and more missing things or incompatibilities with other systems, which make it harder to cover more use cases. And finally you realize that "the tiniest system" is a lot more effort than it's worth, and what you really want is "a slim yet compatible system". The system you end up with is a lot fatter, but a lot less headache.

(The security benefits of fewer files are overblown, too. If you audit and harden the system, it doesn't matter how many binaries you have, because the attack vectors they use will be mitigated)


We use “system extensions” to give you flexibility while keeping the base small.

Want GPU drivers? Add the extension. Need Tailscale? Extension.

https://www.talos.dev/v1.6/talos-guides/configuration/system...


How is that different than adding binaries, in the context of this comment where the point was "In the end you end up just adding everything back in that you originally took out, because managing all the little weird different subsets is not worth the benefit."

In the context of that assertion, adding or subtracting "extensions" and adding or subtracting binaries are equivalent. Both are adding or subtracting "piece of code that provides fuction".


They are similar but extensions don't have to be binaries. It can also be files you need to be available before Kubernetes starts.

Talos is purpose built to run Kubernetes workloads and not general purpose Linux. Hopefully, you don't have to add _everything_ back to the OS, but we know some things cannot run as a container or Kubernetes workload.

Extensions are required for specialized hardware (eg network, GPUs) and is the closest thing to a "package manager" available in Talos. Extensions can be binaries but don't have to be. We have a lot of common extensions provided and maintained by us but anyone can create extensions as needed.

One nice thing about extensions is they get layered and you don't have to pre-build an artifact like you do with other Linux distros with something like packer. factory.talos.dev will let you pick you extensions and get an artifact no packer/bash/config management required.


Like the other commenter said, this is the kind of thing custom distros always have to do, that lead to either reinventing or just going back to a big fat distro. I've gone around this wheel a couple times. It's a nice idea to have something custom-tailored, but a tailored suit only fits one person, and is expensive.

You can still build a system to manage Kubernetes nodes without making your own distro. Even a heavily modified stock distro gains you benefits from basing off someone else's work. You can reuse the solutions they've made, and contribute back your customizations for your specific use cases.

That's how today's distro installers/package managers/etc came to include all the functionality they have. You couldn't eject a CDROM from a Busybox system, until one weird kid in high school decided to try to use Busybox to make a CD-bootable RAM-resident distro, found out it had no 'eject' command, and then sent a patch in to Busybox to add it. Now everyone can use that command, and that functionality is still there 20 years later.

It's also a lot harder for users to use proprietary solutions than ones they're familiar with. Your OS has no shell or console, only an API? So if there's a problem, how do I drop in with gdb, strace, tcpdump, and the entire suite of Linux debugging tools, to try and quickly diagnose and then patch an issue? I'm sure you've created some way to do it, but now I need to go find out how to do it, and probably use whatever stock tools are there, which may have their own quirks or incompatibilities.

But I get that a corporation's interest is mostly in "get something working now" as opposed to "get something working that will be better in the long run". DIY/NIH often becomes the engineering department's watchwords, and a custom distro is one of those eventualities.


We run many talos clusters at Civo, and they are far easier to manage than the other cluster types that use a standard Linux distribution stripped and stuffed with what we need, and the custom image build process is easy to get running in CI, all in all talos is wonderful both as tenant nodes and our region supercluster nodes, and much simpler process to add/remove nodes to the pool and do a few other k8s-centric tasks like etcd snapshot backups and pre-configuration of our regions before we have kit on the ground.


Executables also don't have to be binaries, and the executable interface is also an api.

A convincing argument might exist, but I haven't heard one so far.


MX Linux would be that "slim-yet-compatible" system. It can load itself completely into RAM, and has many distro-specific components such as Frugal Install and Remastering, yet also lets you install standard Debian packages.


In the case of Talos, Kubernetes can provide the flexibility you want from a more traditional Linux distribution.


That sounds like Kool aid. Can you expound more on this?


With Kubernetes you can schedule workloads in a number of different ways. Let’s say you insisted on having a shell and package manager. Run your favorite distro’s container as a DaemonSet. With the proper mounts and permissions you can do a lot. In other words use Kubernetes to do the things need to do. Then what role does the OS really play? Well in the case of Talos it’s only there to run Kubernetes.


You literally do not need more than the most basic of host-OS facilities to run any Linux workload you want in a pod, the OS just needs to run containerd and have a kernel, talos adds the management plane to that mix to make it useable, but the userland provided by the oci image will not see any difference at all than if it were running on Ubuntu; it will see the kernel.


I see Talos only supports XFS, what potential reasons could they have to prefer XFS to competitors?

I've always struggled to compare filesystems fairly. My justification for ext4 is just that everybody else uses it :)


Mongo and Elasticsearch also recommend using XFS. So Talos isn't unique in this. XFS is somewhat more often preferred in data intensive systems.


xfs is technically more robust and performant than ext4, ext4's just the most widely supported, but most use xfs when available


Big fan of Talos, I use it on Hetzner and it's a joy!


How do you install it?


Where is networking configured? I assume the system has to have an IP address before containerd can fetch images.


Everything is API driven and static networking can be configured via kernel args

https://www.talos.dev/latest/reference/configuration/v1alpha...


You use machineConfigs that are used to provision the base OS and configure it, and the clusterConfig is used to bootstrap k8s on those machines. You can make subtypes and super types, you can have different networking setups, whatever you like, just apply and the OS is driven to state, then k8s is brought up from there. You are presented a kubeconfig after. Changes are done via application of updated machineConfig. Works great in practices and if you write an operator you can manage the config generation via k8s manifests and get wild with it.


DHCP I would assume.


Yeah but where is the DHCP client? In the kernel?


In machined (PID1 of Talos).


And moving a network protocol implementation into PID1 is good why? So any security vulnerability in the DHCP implementation gives you root.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: