About Containers and VMs

(linuxcontainers.org)

105 points | by Bogdanp 4 days ago

9 comments

  • hliyan 1 day ago

    As I always say: a VM makes an OS believe that it has the machine to itself; a container makes a process believe that it has the OS to itself.

    • fulafel 1 day ago

      I think they linuxcontainers.org people would disagree. Like the table is trying to communicate, in contrast to eg Docker, this is not about application containerization.

      • hamandcheese 1 day ago

        The table is comparing all three types: VM, system containers, application containers. Incus supports application containers. Its a relatively recent addition.

        I can't find great docs for it, but its in the release notes last year: https://linuxcontainers.org/incus/news/2024_07_12_05_07.html

      • lotharcable 17 hours ago

        I think that using the term 'application containers' to reference docker and 'system containers' to reference LXC is a bit of a meaningless distinction.

        You can 100% host "systems containers" on Docker and you can host "applications" on LXC.

        Like if I want a entire OS with it's own init system and users and so on and so forth I can do it it OCI images.

        In fact I use it every single day with distrobox on top of Podman using OCI container images.

        And it works a hell of a lot better then if I tried to do it on LXC.

        • alexeldeib 4 hours ago

          yeah, the system/application distinction feels somewhat superficial. The “multiple user space” inside a container thing sounds interesting (not sure what that means exactly), but maybe more similar to a Kubernetes pod, except maybe instead of different rootfs there’s another isolation mechanism?

      • abhinavk 1 day ago

        And a system container makes an OS (or OS userland) believe that that it has the kernel to itself.

        • falcojr 1 day ago

          That's literally the opposite of what this documentation is explaining. System containers exist. You can run the entire userspace of an OS (including systemd) in a container.

          • weikju 1 day ago

            I'll have to remember that one!

          • mappu 1 day ago

            VMs also don't always require hardware virtualization - Alibaba's PVM https://lkml.org/lkml/2024/2/26/1263 didn't get upstreamed, but, theoretically the MMU is all you need for complete isolation. This kind of idea is also how VM software worked before VT-x was introduced. And of course QEMU has the TCG which works with no kernel support at all.

            • SirGiggles 1 day ago

              I think you could also add Xen to that list. IIRC, the old Xen PV mode was purely paravirtualized without using any hardware extensions.

              • eru 1 day ago

                Yes, Xen was big on paravirtualisation but started supporting the other kinds pretty soon, too. (At least they were supported around 2009-2012, when I was working on XenServer.)

                • SirGiggles 1 day ago

                  I think things are swinging back the other way if I have understood the more recent PVHv2 stuff correctly.

              • In my experience TCG (or any method that doesn't require root / admin power) is pretty slow. But I'd be happy to be wrong about that, for an odd project I have

                • eru 1 day ago

                  It depends a bit on your workload. If you have a pure computation workload, without much IO, TCG etc doesn't need to be slow.

                  • It also depends on the architectures. x86 on ARM is tough to do efficiently because of the memory model differences. One of the keys to Rosetta 2 being so good was being able to make the underlying ARM processor obey the x86 memory model (even though it was still executing ARM instructions).

              • scottyeager 1 day ago

                Incus is really nice. It manages to provide a rather container-like experience for VMs. Having the ability to grab a shell on or copy files to/from a VM with the ease of using Docker is a great quality of life improvement. This requires an agent running in the VM but it's already included in the images from the project repo.

                • reilly3000 1 day ago

                  Can someone explain how a system container is more secure than an application container, if that is indeed the case?

                  • It mostly isn't. Almost all Linux container escapes only require the ability to make system calls to the shared kernel from processes inside the container. The system container doesn't really restrict this ability. It also increases surface area to compromise the container before attacking the host system, since there's now a bunch of extra software running inside the container.

                    If privilege isolation is a priority but you want to use containers, gVisor and Firecracker are way ahead of anything else. The Linux kernel API has proved to be very hard to secure, and not for lack of trying.

                    • lotharcable 17 hours ago

                      "Systems containers" are almost certainly isn't more secure since 'root' means things, even in a container.

                      Containers just leverage existing Linux namespace isolation techniques to isolate applications.

                      A good way to think about it is that they act like blinders on a horse. If applications can't "see stuff" or reference items outside of the container then they don't know it exists and don't know how to interact with it.

                      "application containers" can take advantage of more then just namespaces to isolate applications, such as running them as unprivileged users inside the container's context and thus limiting them from the sort of kernel features that get exposed inside the containers. Or cgroups to limit resource usage and other smaller things like that.

                      Regardless "Security" and "Containers" really shouldn't be written about in the same paragraph without MAC framework like SELinux in place or additional isolation techniques like VMs.

                      Although VMs are a lot more like containers then people realize.

                    • SirGiggles 1 day ago

                      In the context of Incus, they are the same.

                      Incus and LXC internally use umoci to manipulate the OCI tarball to conform to how LXC runs containers.

                      See: - https://umo.ci/ - https://github.com/lxc/lxc/blob/lxc-4.0.2/templates/lxc-oci....

                      • cakealert 1 day ago

                        It's not really.

                        Any shared resource between containers or the kernel itself is an attack surface.

                        Both options have a very wide attack surface - the kernel api.

                        Nothing really beats virtualization in security, the surface shrinks to pretty much just the virtualization bits in the kernel and some user space bits in the VMM.

                        • fulafel 1 day ago

                          Complexity is generally the enemy of security, because securing a system requires understanding it. If you can build a more understandable, less moving parts, more observable, more easily manageable etc system with system containers, it's a security argument.

                          • zie 1 day ago

                            It generally is more secure just because the system container virtualization system is "more complete", so it's harder to get out from under it.

                            My understanding with Incus(the OP link) it's the same virtualization system, so there is no real difference, security wise between the two.

                            The question then becomes can they get out from under the virtualization and can they get access to other machines, containers, etc.

                            Docker's virtualization system has been very weak security wise. So a system container would be more secure than docker's virtualization system.

                            • thundergolfer 1 day ago

                              The article is pretty useless at explaining the difference, I agree. It makes claims about Docker that aren't true (e.g. single container) while making inadequate reference to the OS features likely involved in making "system containers" what they are (SECCOMP, capabilities, network namespaces, nftables).

                              As an engineer this page has a real "trust me bro" feel to it. Maybe fine as a marketing and product positioning thing, but not interesting for HN.

                              • SirGiggles 1 day ago

                                This has been one personal pet peeve with the documentation surrounding Incus.

                                As a stack, Incus has been exceptional, it has largely replaced Proxmox and Podman Quadlets for me. For context, I homelab so I cannot generalize my claim to SMB or enterprise.

                                But the documentation has been very end user oriented, information regarding specifics like seccomp as you mentioned are only discoverable with the search bar and that leads to various disparate locations; and that also isn't taking into account that some of the more nitty gritty information isn't on the Incus portion of linuxcontainers.org, see the LXC Security page for example: https://linuxcontainers.org/lxc/security/

                            • Ericson2314 1 day ago

                              IMO it's not good that the kernel interferences keep on spawning endless userland "middleware" projects.

                              I still want capsicum to give me sane defaults, so the incentive for sandbox security theater goes away.

                              • jeltz 1 day ago

                                Seems mostly off topic to the article. I think system containers should be implemented in user space. They are not about security theatre but about getting a sandboxed environment which feels like a real/virtual machine but is lighter weight. Very useful e.g. when I want to emulate a whole cluster of Linux machines. And for those needs security is nice but not key.

                                It is application containers which maybe should be replaced by better kernel security, not system containers.

                                • Ericson2314 17 hours ago

                                  So from the capsicum perspective, when you spawn a process, it should be maximally isolated by default. Any sharing of resources should be opt-in, not opt-out.

                                  This is not a big change implementation-wise, but it completely changes the programming model. Instead of dreaming up endless new sandboxing strategies, we just give processes exactly what they need, no more, no less.

                              • kottapar 1 day ago

                                This sounds very similar to BootC except that BootC is immutable

                                • skywhopper 1 day ago

                                  What is this? Docker containers can host more than one process/service/app. And why is some product called “Incus” using “linuxcontainers.org” as a domain name?

                                  • paulhart 1 day ago

                                    According to their Github page, they _are_ linuxcontainers (in a way), and Incus is Apache licensed:

                                    Incus, which is named after the Cumulonimbus incus or anvil cloud started as a community fork of Canonical's LXD following Canonical's takeover of the LXD project from the Linux Containers community.

                                    The project was then adopted by the Linux Containers community, taking back the spot left empty by LXD's departure.

                                    Incus is a true open source community project, free of any CLA and remains released under the Apache 2.0 license. It's maintained by the same team of developers that first created LXD.

                                    LXD users wishing to migrate to Incus can easily do so through a migration tool called lxd-to-incus.

                                    https://github.com/lxc/incus

                                  • SirGiggles 1 day ago

                                    Linux Containers, or LXC, came before Docker and OCI standardization.

                                    As the others have mentioned, Incus is the community fork led by former members of the LXD team.

                                    • antod 22 hours ago

                                      Very early versions of Docker even used LXC before they replaced it with libcontainer.

                                    • xrd 1 day ago

                                      incus is the truly open source version of lxc/lxd. It is stable and incredible. I manage dozens of machines and want for nothing, and most importantly, pay nothing for that luxury.

                                      • aitchnyu 1 day ago

                                        Are (self hosting) people putting multiple services like Django app, Postgres, Redis etc into a single container/lightweight VM instead of using Docker Compose with single-purpose containers?

                                        • skydhash 22 hours ago

                                          You don’t have too, as you can create a single posgres instance for your services.

                                          I prefer Incus, because you can’t do adhoc patching with docker. Instead you have to rebuild the images and that becomes a hassle quicky in a homelab settings. Incus have a VM feel while having docker management UX.

                                      • jiggawatts 1 day ago

                                        It's a bad sign that the first table on the page is full of errors.

                                        "Can only host Linux" -- Windows Containers are a thing too: https://learn.microsoft.com/en-us/virtualization/windowscont...

                                        "Can host a single app" -- not true either. It's just bad practice to host multiple apps in a single container, but it's definitely possible.

                                        IMHO it's not very nice to use the generic-sounding "linuxcontainers.org" domain exclusively for LXC-related content there.

                                        • weikju 1 day ago

                                          On incus/lxd is true there containers can only be Linux..

                                          Not sure about the one app thing but that’s the general design of those ad well I suppose.

                                          • jiggawatts 1 day ago

                                            Which just validates my point that a generic-sounding domain is the wrong place to host content that even within the Linux ecosystem is a relatively minor player.

                                            • pxc 1 day ago

                                              Not only is this project website older than Docker, early versions of Docker literally used LXC as the backend, which was supported in Docker for the first two years of its life.

                                              The Docker folks could have done their work under this umbrella and (maybe for good reasons) chose not to. For later container runtimes, idk the story.

                                              But this project/community definitely laid the groundwork for all of those later Linux container runtimes.

                                              • chucky_z 1 day ago

                                                lxc is used really frequently in the home space (jellyfin/plex for instance). A lot of Proxmox use cases as well which is growing in popularity extremely rapidly.

                                                • esseph 1 day ago

                                                  I really wish I could just run regular docker or oci containers in Proxmox.

                                                  • jiggawatts 1 day ago

                                                    Which is small in the scope of things when Docker Desktop and containerd are both used at far larger scales.

                                                  • cyberge99 1 day ago

                                                    I’m not sure I follow. Are you suggesting OP has an incorrect apex domain name?

                                                    • 9dev 1 day ago

                                                      It’s like selling Pepsi exclusively on soda.org.

                                                      • Kudos 1 day ago

                                                        For that analogy to hold, Pepsi would have also invented sodas.

                                                        • 9dev 1 day ago

                                                          Like that matters to consumers? Regardless of who invented sodas, the market has changed and people connect more brands to the kind of drink now, so equating Pepsi to Soda is factually incorrect.

                                                        • jeltz 1 day ago

                                                          Only if Pepsi had always been called Soda Co and was older than Coca Cola.

                                                          • weikju 1 day ago

                                                            Don’t give them any ideas!!!

                                                        • TrueDuality 1 day ago

                                                          LXC far predates docker regardless of size or impact. It's not disingenuous if you were literally the foundation docker was able to package into a shiny accessible tool.

                                                      • wutwutwat 1 day ago

                                                        linux containers, be it a lxd container, or a containerd/dockerd one, only run on linux hosts.

                                                        windows containers, only run on windows hosts.

                                                        when you run a linux container on a windows host, you're actually running a linux container inside of a linux vm on top of a windows host.

                                                        containers share the host operating system's kernel. it is impossible for a linux container (which is just a linux process) to execute and share the windows kernel. the reverse is true, a windows container (which is just a process) cannot execute and share the linux kernel

                                                        the article is correct, linux containers can only execute on a linux host

                                                        • 1718627440 16 hours ago

                                                          Except if you have a kernel that has multiple personalities, so it can implement different OS interfaces like the NT kernel implementing both Win32 and Linux.

                                                        • pjmlp 1 day ago

                                                          Not only that, containers predate Linux implementations, I was using HP-UX Vaults in 1999.

                                                        • worik 1 day ago

                                                          Very cool...

                                                          In my experience it has gotta be Docker. For these reasons:

                                                          1. I said so

                                                          2. I'm the boss

                                                          3. Goto 1.