About Containers and VMs

(linuxcontainers.org)

112 points | by Bogdanp 166 days ago

9 comments

  • hliyan 163 days ago

    As I always say: a VM makes an OS believe that it has the machine to itself; a container makes a process believe that it has the OS to itself.

    • fulafel 163 days ago

      I think they linuxcontainers.org people would disagree. Like the table is trying to communicate, in contrast to eg Docker, this is not about application containerization.

      • lotharcable 163 days ago

        I think that using the term 'application containers' to reference docker and 'system containers' to reference LXC is a bit of a meaningless distinction.

        You can 100% host "systems containers" on Docker and you can host "applications" on LXC.

        Like if I want a entire OS with it's own init system and users and so on and so forth I can do it it OCI images.

        In fact I use it every single day with distrobox on top of Podman using OCI container images.

        And it works a hell of a lot better then if I tried to do it on LXC.

        • alexeldeib 162 days ago

          yeah, the system/application distinction feels somewhat superficial. The “multiple user space” inside a container thing sounds interesting (not sure what that means exactly), but maybe more similar to a Kubernetes pod, except maybe instead of different rootfs there’s another isolation mechanism?

        • hamandcheese 163 days ago

          The table is comparing all three types: VM, system containers, application containers. Incus supports application containers. Its a relatively recent addition.

          I can't find great docs for it, but its in the release notes last year: https://linuxcontainers.org/incus/news/2024_07_12_05_07.html

      • abhinavk 163 days ago

        And a system container makes an OS (or OS userland) believe that that it has the kernel to itself.

        • falcojr 163 days ago

          That's literally the opposite of what this documentation is explaining. System containers exist. You can run the entire userspace of an OS (including systemd) in a container.

          • weikju 163 days ago

            I'll have to remember that one!

          • mappu 163 days ago

            VMs also don't always require hardware virtualization - Alibaba's PVM https://lkml.org/lkml/2024/2/26/1263 didn't get upstreamed, but, theoretically the MMU is all you need for complete isolation. This kind of idea is also how VM software worked before VT-x was introduced. And of course QEMU has the TCG which works with no kernel support at all.

            • SirGiggles 163 days ago

              I think you could also add Xen to that list. IIRC, the old Xen PV mode was purely paravirtualized without using any hardware extensions.

              • eru 163 days ago

                Yes, Xen was big on paravirtualisation but started supporting the other kinds pretty soon, too. (At least they were supported around 2009-2012, when I was working on XenServer.)

                • SirGiggles 163 days ago

                  I think things are swinging back the other way if I have understood the more recent PVHv2 stuff correctly.

              • 01HNNWZ0MV43FF 163 days ago

                In my experience TCG (or any method that doesn't require root / admin power) is pretty slow. But I'd be happy to be wrong about that, for an odd project I have

                • eru 163 days ago

                  It depends a bit on your workload. If you have a pure computation workload, without much IO, TCG etc doesn't need to be slow.

                  • johncolanduoni 163 days ago

                    It also depends on the architectures. x86 on ARM is tough to do efficiently because of the memory model differences. One of the keys to Rosetta 2 being so good was being able to make the underlying ARM processor obey the x86 memory model (even though it was still executing ARM instructions).

              • scottyeager 163 days ago

                Incus is really nice. It manages to provide a rather container-like experience for VMs. Having the ability to grab a shell on or copy files to/from a VM with the ease of using Docker is a great quality of life improvement. This requires an agent running in the VM but it's already included in the images from the project repo.

                • reilly3000 163 days ago

                  Can someone explain how a system container is more secure than an application container, if that is indeed the case?

                  • johncolanduoni 163 days ago

                    It mostly isn't. Almost all Linux container escapes only require the ability to make system calls to the shared kernel from processes inside the container. The system container doesn't really restrict this ability. It also increases surface area to compromise the container before attacking the host system, since there's now a bunch of extra software running inside the container.

                    If privilege isolation is a priority but you want to use containers, gVisor and Firecracker are way ahead of anything else. The Linux kernel API has proved to be very hard to secure, and not for lack of trying.

                    • lotharcable 163 days ago

                      "Systems containers" are almost certainly isn't more secure since 'root' means things, even in a container.

                      Containers just leverage existing Linux namespace isolation techniques to isolate applications.

                      A good way to think about it is that they act like blinders on a horse. If applications can't "see stuff" or reference items outside of the container then they don't know it exists and don't know how to interact with it.

                      "application containers" can take advantage of more then just namespaces to isolate applications, such as running them as unprivileged users inside the container's context and thus limiting them from the sort of kernel features that get exposed inside the containers. Or cgroups to limit resource usage and other smaller things like that.

                      Regardless "Security" and "Containers" really shouldn't be written about in the same paragraph without MAC framework like SELinux in place or additional isolation techniques like VMs.

                      Although VMs are a lot more like containers then people realize.

                    • SirGiggles 163 days ago

                      In the context of Incus, they are the same.

                      Incus and LXC internally use umoci to manipulate the OCI tarball to conform to how LXC runs containers.

                      See: - https://umo.ci/ - https://github.com/lxc/lxc/blob/lxc-4.0.2/templates/lxc-oci....

                      • cakealert 163 days ago

                        It's not really.

                        Any shared resource between containers or the kernel itself is an attack surface.

                        Both options have a very wide attack surface - the kernel api.

                        Nothing really beats virtualization in security, the surface shrinks to pretty much just the virtualization bits in the kernel and some user space bits in the VMM.

                        • fulafel 163 days ago

                          Complexity is generally the enemy of security, because securing a system requires understanding it. If you can build a more understandable, less moving parts, more observable, more easily manageable etc system with system containers, it's a security argument.

                          • zie 163 days ago

                            It generally is more secure just because the system container virtualization system is "more complete", so it's harder to get out from under it.

                            My understanding with Incus(the OP link) it's the same virtualization system, so there is no real difference, security wise between the two.

                            The question then becomes can they get out from under the virtualization and can they get access to other machines, containers, etc.

                            Docker's virtualization system has been very weak security wise. So a system container would be more secure than docker's virtualization system.

                            • thundergolfer 163 days ago

                              The article is pretty useless at explaining the difference, I agree. It makes claims about Docker that aren't true (e.g. single container) while making inadequate reference to the OS features likely involved in making "system containers" what they are (SECCOMP, capabilities, network namespaces, nftables).

                              As an engineer this page has a real "trust me bro" feel to it. Maybe fine as a marketing and product positioning thing, but not interesting for HN.

                              • SirGiggles 163 days ago

                                This has been one personal pet peeve with the documentation surrounding Incus.

                                As a stack, Incus has been exceptional, it has largely replaced Proxmox and Podman Quadlets for me. For context, I homelab so I cannot generalize my claim to SMB or enterprise.

                                But the documentation has been very end user oriented, information regarding specifics like seccomp as you mentioned are only discoverable with the search bar and that leads to various disparate locations; and that also isn't taking into account that some of the more nitty gritty information isn't on the Incus portion of linuxcontainers.org, see the LXC Security page for example: https://linuxcontainers.org/lxc/security/

                            • Ericson2314 163 days ago

                              IMO it's not good that the kernel interferences keep on spawning endless userland "middleware" projects.

                              I still want capsicum to give me sane defaults, so the incentive for sandbox security theater goes away.

                              • jeltz 163 days ago

                                Seems mostly off topic to the article. I think system containers should be implemented in user space. They are not about security theatre but about getting a sandboxed environment which feels like a real/virtual machine but is lighter weight. Very useful e.g. when I want to emulate a whole cluster of Linux machines. And for those needs security is nice but not key.

                                It is application containers which maybe should be replaced by better kernel security, not system containers.

                                • Ericson2314 163 days ago

                                  So from the capsicum perspective, when you spawn a process, it should be maximally isolated by default. Any sharing of resources should be opt-in, not opt-out.

                                  This is not a big change implementation-wise, but it completely changes the programming model. Instead of dreaming up endless new sandboxing strategies, we just give processes exactly what they need, no more, no less.

                              • skywhopper 163 days ago

                                What is this? Docker containers can host more than one process/service/app. And why is some product called “Incus” using “linuxcontainers.org” as a domain name?

                                • paulhart 163 days ago

                                  According to their Github page, they _are_ linuxcontainers (in a way), and Incus is Apache licensed:

                                  Incus, which is named after the Cumulonimbus incus or anvil cloud started as a community fork of Canonical's LXD following Canonical's takeover of the LXD project from the Linux Containers community.

                                  The project was then adopted by the Linux Containers community, taking back the spot left empty by LXD's departure.

                                  Incus is a true open source community project, free of any CLA and remains released under the Apache 2.0 license. It's maintained by the same team of developers that first created LXD.

                                  LXD users wishing to migrate to Incus can easily do so through a migration tool called lxd-to-incus.

                                  https://github.com/lxc/incus

                                • SirGiggles 163 days ago

                                  Linux Containers, or LXC, came before Docker and OCI standardization.

                                  As the others have mentioned, Incus is the community fork led by former members of the LXD team.

                                  • antod 163 days ago

                                    Very early versions of Docker even used LXC before they replaced it with libcontainer.

                                  • xrd 163 days ago

                                    incus is the truly open source version of lxc/lxd. It is stable and incredible. I manage dozens of machines and want for nothing, and most importantly, pay nothing for that luxury.

                                    • aitchnyu 163 days ago

                                      Are (self hosting) people putting multiple services like Django app, Postgres, Redis etc into a single container/lightweight VM instead of using Docker Compose with single-purpose containers?

                                      • skydhash 163 days ago

                                        You don’t have too, as you can create a single posgres instance for your services.

                                        I prefer Incus, because you can’t do adhoc patching with docker. Instead you have to rebuild the images and that becomes a hassle quicky in a homelab settings. Incus have a VM feel while having docker management UX.

                                    • kottapar 163 days ago

                                      This sounds very similar to BootC except that BootC is immutable

                                      • jiggawatts 163 days ago

                                        It's a bad sign that the first table on the page is full of errors.

                                        "Can only host Linux" -- Windows Containers are a thing too: https://learn.microsoft.com/en-us/virtualization/windowscont...

                                        "Can host a single app" -- not true either. It's just bad practice to host multiple apps in a single container, but it's definitely possible.

                                        IMHO it's not very nice to use the generic-sounding "linuxcontainers.org" domain exclusively for LXC-related content there.

                                        • wutwutwat 163 days ago

                                          linux containers, be it a lxd container, or a containerd/dockerd one, only run on linux hosts.

                                          windows containers, only run on windows hosts.

                                          when you run a linux container on a windows host, you're actually running a linux container inside of a linux vm on top of a windows host.

                                          containers share the host operating system's kernel. it is impossible for a linux container (which is just a linux process) to execute and share the windows kernel. the reverse is true, a windows container (which is just a process) cannot execute and share the linux kernel

                                          the article is correct, linux containers can only execute on a linux host

                                          • 1718627440 163 days ago

                                            Except if you have a kernel that has multiple personalities, so it can implement different OS interfaces like the NT kernel implementing both Win32 and Linux.

                                            • wutwutwat 153 days ago

                                              The NT kernel does not operate like that, at least not anymore...

                                              The NT kernel originally had Microsoft POSIX subsystem[0], which was discontinued and replaced with Windows Services for UNIX[1], which was then replaced with Windows Subsystem for Linux[2]. WSL has had two versions;

                                              WSL 1 implemented a subset of linux syscalls directly in the windows kernel. This was discontinued and replaced with WSL 2

                                              WSL 2 is running, you guessed it, a linux VM[3]

                                              > The original version, WSL 1, differs significantly from the second major version, WSL 2. WSL 1 (released August 2, 2016), acted as a compatibility layer for running Linux binary executables (in ELF format) by implementing Linux system calls in the Windows kernel. WSL 2 (announced May 2019), introduced a real Linux kernel – a managed virtual machine (via Hyper-V) that implements the full Linux kernel. As a result, WSL 2 is compatible with more Linux binaries as not all system calls were implemented in WSL 1.

                                              > Version 2 introduces changes in the architecture. Microsoft has opted for virtualization through a highly optimized subset of Hyper-V features, in order to run the kernel and distributions

                                              > The distribution installation resides inside an ext4-formatted filesystem inside a virtual disk, and the host file system is transparently accessible through the 9P protocol

                                              When you run linux containers on a windows host, you're running those containers inside of a linux vm.

                                              0: https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem

                                              1: https://en.wikipedia.org/wiki/Windows_Services_for_UNIX

                                              2: https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux

                                              3: https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux#WS...

                                              • 1718627440 152 days ago

                                                WSL2 is not intended to supersede WSL1 but to coexist. Was you argument that the NT kernel doesn't do it since WSL1, or that it doesn't do it in WSL1? Only the later would lead to: "The NT kernel does not operate like that, at least not anymore...".

                                          • weikju 163 days ago

                                            On incus/lxd is true there containers can only be Linux..

                                            Not sure about the one app thing but that’s the general design of those ad well I suppose.

                                            • jiggawatts 163 days ago

                                              Which just validates my point that a generic-sounding domain is the wrong place to host content that even within the Linux ecosystem is a relatively minor player.

                                              • pxc 163 days ago

                                                Not only is this project website older than Docker, early versions of Docker literally used LXC as the backend, which was supported in Docker for the first two years of its life.

                                                The Docker folks could have done their work under this umbrella and (maybe for good reasons) chose not to. For later container runtimes, idk the story.

                                                But this project/community definitely laid the groundwork for all of those later Linux container runtimes.

                                                • chucky_z 163 days ago

                                                  lxc is used really frequently in the home space (jellyfin/plex for instance). A lot of Proxmox use cases as well which is growing in popularity extremely rapidly.

                                                  • esseph 163 days ago

                                                    I really wish I could just run regular docker or oci containers in Proxmox.

                                                    • jiggawatts 163 days ago

                                                      Which is small in the scope of things when Docker Desktop and containerd are both used at far larger scales.

                                                    • cyberge99 163 days ago

                                                      I’m not sure I follow. Are you suggesting OP has an incorrect apex domain name?

                                                      • 9dev 163 days ago

                                                        It’s like selling Pepsi exclusively on soda.org.

                                                        • Kudos 163 days ago

                                                          For that analogy to hold, Pepsi would have also invented sodas.

                                                          • 9dev 163 days ago

                                                            Like that matters to consumers? Regardless of who invented sodas, the market has changed and people connect more brands to the kind of drink now, so equating Pepsi to Soda is factually incorrect.

                                                            • Kudos 161 days ago

                                                              Oh God, you're really torturing the analogy now.

                                                              • 9dev 161 days ago

                                                                That’s my secret super power.

                                                          • weikju 163 days ago

                                                            Don’t give them any ideas!!!

                                                            • jeltz 163 days ago

                                                              Only if Pepsi had always been called Soda Co and was older than Coca Cola.

                                                          • TrueDuality 163 days ago

                                                            LXC far predates docker regardless of size or impact. It's not disingenuous if you were literally the foundation docker was able to package into a shiny accessible tool.

                                                        • pjmlp 163 days ago

                                                          Not only that, containers predate Linux implementations, I was using HP-UX Vaults in 1999.

                                                        • worik 163 days ago

                                                          Very cool...

                                                          In my experience it has gotta be Docker. For these reasons:

                                                          1. I said so

                                                          2. I'm the boss

                                                          3. Goto 1.