"Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > Quoting Glauber Costa (glommer@xxxxxxxxxxxxx): >> On 09/04/2012 07:25 PM, Serge Hallyn wrote: >> > Quoting Glauber Costa (glommer@xxxxxxxxxxxxx): >> >> On 09/04/2012 06:44 PM, Serge Hallyn wrote: >> >>> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): >> >>>> Glauber Costa <glommer@xxxxxxxxxxxxx> writes: >> >>>> >> >>>>> On 08/31/2012 04:13 AM, Eric W. Biederman wrote: >> >>>>>> "Daniel P. Berrange" <berrange@xxxxxxxxxx> writes: >> >>>>>> >> >>>>>>> On Thu, Aug 30, 2012 at 03:15:17PM -0700, Eric W. Biederman wrote: >> >>>>>>>> "Daniel P. Berrange" <berrange@xxxxxxxxxx> writes: >> >>>>>>>> >> >>>>>>>>> One of the features that SystemD folks have asked us to fix in LXC, is >> >>>>>>>>> to make sure that /proc/sys/kernel/random/boot_id changes each time a >> >>>>>>>>> container is started. >> >>>>>>>> >> >>>>>>>> There may be a good reason for this. Most of the time what I have seen >> >>>>>>>> of kernel requests from the direction of SystemD is that while there may >> >>>>>>>> be a real problem but usually their imagined solution is not a >> >>>>>>>> particularly good solution. So a description of the problem is needed. >> >>>>>>>> >> >>>>>>>> Justifying something with just SystemD wants this is a good way to get >> >>>>>>>> a nack. >> >>>>>>> >> >>>>>>> SystemD records log messages for all system services in their journal. >> >>>>>>> They can show you all log messages for the current service execution, >> >>>>>>> all log messages for a service since system boot, or all log messsages >> >>>>>>> ever. The boot_id value is used as a unique tag to allow grouping of >> >>>>>>> the log messages per system boot. When we run systemd inside a container >> >>>>>>> we want to get that grouping of log messages generated by services inside >> >>>>>>> the container, to take account of the container boot, not the host boot. >> >>>>>>> Hence the desire to have the boot_id value reflect when a container is >> >>>>>>> booted. >> >>>>>> >> >>>>>> Since SystemD post-dates containers and since the logging feature is not >> >>>>>> currently in wide use that use case is completely non-persuasive. >> >>>>>> >> >>>>>> So far this just sounds like a plain SystemD bug and something that can >> >>>>>> be easily changed at this point in time. >> >>>>>> >> >>>>>> It has been a long time but my fuzzy memory says that the originial >> >>>>>> boot_id justification was based on use cases that could not be solved >> >>>>>> any other way. >> >>>>>> >> >>>>>> My memory says it was this thread https://lkml.org/lkml/1999/5/31/233 >> >>>>>> that inspired the implementation of boot_id. However reading the >> >>>>>> current emacs source code it appears emacs gave up before boot_id >> >>>>>> was implemented and stats /var/run/random-seed (which we seem to >> >>>>>> have removed) or looks in wtmp or utmp for the latest boot record. >> >>>>>> >> >>>>>> I did a quick grep through the binaries on my system and I could not >> >>>>>> find anything using /proc/sys/random/boot_id. >> >>>>>> >> >>>>>> That suggests to me that the proper solution is to actually just remove >> >>>>>> boot_id. >> >>>>>> >> >>>>>> Hmm. And then there is other interesting detail. What should boot_id >> >>>>>> return after the processes have migrated from one system to another. >> >>>>>> >> >>>>> >> >>>>> Since this would be a per-boot id, this clearly has to be carried over >> >>>>> with migration, along with all the tons of data we already carry. >> >>>> >> >>>> The twist of course is what does a boot mean. If we are really after >> >>>> machine boots than the current behavior is correct. >> >>>> >> >>>> Looking back in the archives the desired behavior appears to be a value >> >>>> that can be used to see if a pid value must be stale. >> >>>> >> >>>> As a stale pid detector boot_id is pretty lousy. Pids can still be >> >>>> reused. >> >>>> >> >>>> Still a role as a stale pid detector makes it clear which namespace >> >>>> boot_id should be in and how we should treat boot_id upon migration. >> >>>> >> >>>> You can only serve as a stale pid detector if you are in the pid >> >>>> namespace. >> >>>> >> >>>> So at this point patches are welcome. Hopefully with a summary >> >>>> of the discussion. >> >>> >> >>> I don't understand why this should be provided by the kernel. Especially >> >>> given that we've proven that everyone really wants this to be per-container >> >>> as well. >> >>> >> >>> So why not just have init, on startup, create a /run/boot_id file, perhaps >> >>> by sha1summing the time at which it started perhaps plus some nonce? >> >>> >> >> Why shouldn't it provided by the kernel?, is the real question >> > >> > Because it's not the right place. The origin of this thread proves that >> > people want a per-init, not per-kernel, value. >> > >> >> Not all files provided by the kernel are "per-kernel". /proc/self is >> full of per-namespace stuff. >> >> >> The way I see it, every file we need to setup from the outside is a >> >> hassle. Among many other things, it is just asking for duplication of >> >> efforts among multiple userspaces. >> >> >> >> netns does this for its proc files. The only reason we don't do it for >> >> cgroups-driven file, is that the semantics is very ill-defined. For this >> >> file, it doesn't seem to be the case. >> > >> > But it is the case. How do you intend to have the kernel decide what >> > value to put in there for a process in a container, or in a chroot? >> > >> >> one value per pidns. > > ok. (So should it be called /proc/pidns_uuid? Well, whatever. No > objection from me - thanks.) /proc/sys/kernel/boot_id. Someday we will get the plumbing right in the kernel so that can be /proc/sys -> /proc/self/sys and /proc/self/sys/kernel/boot_id The origin of boot_id was so that emacs could implement distributed locking in userspace by creating a symlink from .#filename to user@xxxxxxxxxxxx:boot_id. Ultimately emacs opted to just stat /var/run/random-seed or to grovel through utmp or wtmp to find the last boot record. Of course /var/run/random-seed is now named something like /var/lib/urandom/random-seed as distributions continue their relentless pursuit to break userspace. But ultimately boot_id was defined as something you can use to detect stale pids and stale lockfiles. Since the original definition was a uuid to detect stale pids, that seems a reasonable justification for keeping it in the pid_namespace. Boot_id isn't the best name in that case but shrug. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers