On Tue, 25.05.10 23:02, Casey Dahlin (cdahlin@xxxxxxxxxx) wrote: > > Why do you say "cgroups are a dead end"? Sure, Scott claims that, but > > uh, it's not the only place where he is simply wrong and his claims > > baseless. In fact it works really well, and is one of the strong points > > in systemd. I simply see no alternative for it. The points Scott raised > > kinda showed that he never really played around with them. > > > > Please read up on cgroups, before you just dismiss technology like that > > as "dead end". > > > > I did. When upstart was about to use them. 2 years ago. We chucked them by the > following LPC. Who is "we"? > The problem we've found is that cgroups are too aggressive. They don't have a > notion of sessions and count too much as being part of your service, so you end > up with your screen session being counted as part of gdm. Well, how exactly you set up the groups is up to you, but the way we do it in systemd is stick every service in a seperate cgroup, plus every user in a seperate one, too. Some examples: /systemd-1/avahi.service /systemd-1/getty@xxxxxxxxxxxx /systemd-1/gdm.service /systemd-1/apache.service /user/lennart /user/cjd The per-user cgroups are controlled via a PAM module. That way there's finally a nice way how we can reliably clean up behind a user when he logs out: we just kill his complete cgroup and he's gone. In addition we can easily set all kinds of cgroup-based resource enforcement to these groups, i.e. force user "lennart" to CPU 1, and say that apache and all the cgi scripts it creates can only get up to 20% CPU. And avahi-daemon could be forced to get a quarter of the available RAM at max -- with all its processes summed up, regardless how often it might fork. And the whole thing is even recursive. If you run a per-user systemd as user lennart, you will end up with a sub-hierarchy like this: /user/lennart/systemd-4711/dbus-daemon.service /user/lennart/systemd-4711/dconfd.service And so on. And the nice thing is that these cgroups are shown when you do "ps -eo cgroup,...". You can always figure out from "ps" to which service a process belongs, even if if it fork()ed a gazillions of times. And all the keeping track is done by the kernel, basically for free. No involvement from userspace. > This is why setsid was added to the netlink connector. Well, this is just flawed, on so many levels, that it hurts. Asynchronously trying to follow how daemons fork/exit is just inherently broken because they can do an exponential amount of forks in the time you can (realisticly) collect them linearly only. Also, if a daemon forks too often, netlink willl drop your messages, which makes an easy-peasy way for processes to escape your supervision, using only inprivileged operations. You have constant userspace wakeups. Everything you apply on the processes is done asynchronously and hence is racy (killing, renice, yadda yadda). The problem is simply that your grip on the processes can never work, because you are scheduled at the same priority as the daemons you supervise and you get all notifications asynchronously. You *really* want to leave process tracking to the kernel, and not try to emulate that in userspace. Everything else is just unsafe and hence a joke in the context of process baby sitting. It's like if you'd employ a babysitter and give the kid a bike it can escape on while chaining your babysitter to the couch. So, it's just not safe, processes can *easily* escape your supervision. On top of that it is just ugly. And finally, you cannot nicely show the service something belongs to in "ps" the way cgroups give it to you for free. Then, you cannot set cgroup resource enforcement for your services, since you simply have no cgroups. And no nice interfacing with the other libcgroup tools either. Meh, and this lists gets on like this. People should not use cn_proc. It's evil. And if you are using for anything except logging you are doomed. And even for logging it isnt really useful. > 1) Socket activation. Part of Upstart's roadmap. Would happen sooner if you > cared to submit the patch. We don't think its good enough by itself, hence the > rest of Upstart, but a socket activation subsystem that could reach as far as > systemd and even work standalone in settings where systemd can work is > perfectly within Upstart's scope. I'd be happy to firm up the design details > with you if you wanted to contribute patches. Well, for once, it would be nice to judge things due to actually existign features, not of big plans nobody is working on as you apparently admit outright. And then, the socket activtion is nice for various reasons, and lazy-loading is just one of them. The bigger advantage is that it does automatic dependency handling -- which of course is nothign that really fits into upstart's design, since that is based on "events" not dependencies -- events are just broken, as I might note. And adding dependency would turn around upstart's design, making it a completely different beast. I mean, you called socket activation "xinetd-style activation" in the earlier mail of yours -- that is just completely besides the point, because this all is not so much about doing on-demand starting of internet services. systemd-style activation is about parallelizing startup of (mostly) local services and making fully written dependencies obsolete. And that's what is so magic about it. And an init systemd should be designed around this at its core! And systemd is. > 2) Bus activation. Missed opportunity here to actually become the launchpoint > for activated services. I won't criticize that too much though, as its > usefulness is largely dependent on kernelspace DBus, which I've been trying to > bludgeon Marcel Holtmann into turning over to the public for a year > now. Not sure what this has to do with kernel space D-Bus, except that that is practically dead. If people want to reinvestigate the kernel/dbus issue they should not focus on an AF_DBUS, but instead just use netlink and use BSD socket filters for minimizing wakeups, plus come up with something inspired by iptables/ebtables to filter netlink traffic, for the permission problems. But that's a completely different story. > 3) Cutting down on the forking by replacing some of the shell scripts... cool > 3a) With C code... really? Yes, really. MacOS could do it, and so can we. Its not that hard. And as I my add here I already hacked up a big part of it now for the servcies we start by default. > 4) Process environment control. No complaints, and also nothing Upstart doesn't > want to do. Well, have you seen the functionality we provide thre, and the limited stuff upstart has there? And what about the other features? the automounter, a full blown dependency system, depedencies between mounts, automounts, devices, sockets, services, timers, paths, swaps? The snapshot system? The fact that we take advantage of the LSB stuff right now, to paralleize the boot, without changing a single file? All the process control stuff, i.e. the syslog/kmsg hookup of stdout/stderr of every process we spawn? The FS namespace stuff we do? The TTY stuff we do? the IO/CPU scheduler tricks we do? The capability stuff we do? The CPU affinity stuff we do? the full integration of /etc/fstab and friends? The transaction logic? The crosslinking between systemd and syslog/abrt? the almost complete dbus coverage? all the small btis we already took out of the shell boot rpocess and already moved into proper C code in system (and some of it into udevd)? the fact that systemd works for normal users/session supervisor too? The fact that we support different boot targets? The fact we have a UI? the sane copyright? The fact that we don't use bloody bzr? And so on and so forth, i could repeat the whole blog story here. Please just read that instead. > > We did systemd because we thought that technically Upstart is > > fundamentally flawed and misses out on so many opportunities. > > And we think the same of systemd. We? OK, I am listening. What's flawed about systemd? I wrote very detailed explanation why Upstart is just wrong in its core design. Please be so kind and be more specific if you claim the same abotu systemd, because otherwise all you are doing is spreading FUD. > And mail it to upstart-devel-list. I'm being a bit hyperbolic, but actually not > much. That would have been a valid move. Please read my blog story. Thank you. It actually answers you that very question: "Why didn't you just add this to Upstart, why did you invent something new? Well, the point of the part about Upstart above was to show that the core design of Upstart is flawed, in our opinion. Starting completely from scratch suggests itself if the existing solution appears flawed in its core. However, note that we took a lot of inspiration from Upstart's code-base otherwise." If there is a valid reason for starting anew if something exists already, then it is that the basic design of the existing system is just flawed. And hence we started anew. > You should really come work with us. We're fun guys. Man, you are suggesting I didn't talk to the Upstart folks. That's a ridiculous claim, and quite insulting that you imply that. Kay and I have had long discussions with Scott, during the last three years or so, at various confernces. And as far as I can see Upstart is Scott and Scott is Upstart, and the bzr repo commits underline that. And I am sorry if weren't part of those discussions. You should really come work with us. We're even funnier guys. Lennart -- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net http://0pointer.net/lennart/ GnuPG 0x1A015CC4 -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel