Thanks Michal, your answer was really positive and encourage me to proceed further. So I have now an FC18 running within a container under an EL6.4 HOST with kernel 3.9.4 (big smile). Problems starts to unlock themselves as I decided to bypass network.service altogether starting network and sshd manually (ifup lo; ifup eth0; /usr/sbin/sshd). Now able to work in a quiet room with multiple screens available to poke around and catch fast scrolling log messages. (you should never forget about the poor sysadmin freezing in front of the servers room console when your software is reporting a problem and not able to run :-}). As expected the problem stand on a very small detail (within /etc/fstab) Not working /vzgot / ext4 defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 devpts /dev/pts devpts defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 Working #/vzgot / ext4 defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 devpts /dev/pts devpts defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 The fact systemd was not able to cope with this /etc/fstab is quite acceptable, (even if upstart and init have no problem with it), The fact such small trouble drives systemd to an emergency state without reporting clearly is another question. When the last prominent line before asking for maintenance password is about, "Not able to exec /bin/plymouth, <no such file>" you are asking yourself in what mess am I in. The fact that the line just below says, "Please see journal" but journal is not available (empty) just compound the effect. Once I was able to log via remote SSH in emergency.service mode, I played with different services, trying to "ignore-dependencies" but never got a clear message about what was missing. Success was more a lucky guess than the result from a structured approach. So, no, sorry, systemd doesn't grade "production level" (not yet? or never?). May I propose some way to improve it. - journal should be accessible regardless of systemd status or trouble. - when list-dependencies service is displayed, you should mark dependencies already running (or not successfully started?), think about the poor sysadmin!. - You should have a way to proceed in a 'step by step' boot mode (avoiding in parallel fast scrolling report) - On a more philosophical side: * linking PID1 and systemd seems to me a problem (why it is mandatory still escape me), you are limiting your trouble shooting context (double check your design). * the fact systemd is catching more and more functionality to be working should trigger a loud alarm signal about your design (did I understand today's mail correctly?, you can't use logrotate to expire/archive journal.... :/ ) Bug: - After a very quick check, there is maybe a bug the way systemd is handling 'int reboot(int cmd);', I have the strong feeling systemd is not feeding WTERMSIG(status), but it is very preliminary, I could be wrong.... As your request,I can provide you with "vzgot", my container application (which flavor/distribution RPM do you want? src.rpm is available too). While not a fork of LXC, I think vzgot is very close to LXC about the way the container is started, difference is more about container definition, with vzgot, you just need a DNS resolution (for the container's IPs) and a config_list, linking container name to a distribution name, a template name and an architecture. With that data, vzgot is able to create a running container by itself. I tried to have the container setup as lean, simple and flexible as possible. I put that project in sleep mode, because a trouble I reported 3 years ago (a syslog+printk cross leakage between HOST and containers) seems to be very difficult to address within the kernel. But!... very good news yesterday!, problem is fixed within kernel 3.10.0, maybe it is time to work on vzgot again?. Quoting Michal Schmidt <mschmidt@xxxxxxxxxx>:
On 07/02/2013 04:08 PM, Jean-Marc Pigeon wrote:I was not expecting to have it fully working at the first attempt in my own container design,Would you be willing to provide some details about your container design? Ideally including the code to allow others to reproduce the problems you saw. Have you seen these recommendations?: http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/but I was expecting systemd (using systemctl very detailed status) to give me a very good insight about issues which could occur. The real goal was to learn how to use systemd components to diagnose an "in trouble" real system, a kind of flight simulator exercise, so that we would be ready in the future to do quick diagnosis if one of our server in a rack had trouble to boot or reboot with EL7.Interesting excersise, but I am afraid by running it in a custom container design and running under a host that itself is not using systemd you uncovered an entirely different class of problems than what can happen when running it on the host.This small exercise turned out very ugly very quickly, I worked very hard trying all the tricks and bypass I could think about to collect data. To my dismay I was unable to get a predictable behaviour, nor reliable data from systemd, even in the emergency.service mode. After a while, I was forced to face it, systemd won't help me, not even start the system in a minimal mode, I was not able to go beyond kernel level with systemd in control, services started were a total mess and container was totaly lock up, with no exploitable data provided.Not sure how much of it relates to container environments, but have you seen this?: http://freedesktop.org/wiki/Software/systemd/Debugging/ My first goal when debugging issues like this would be to make sure I can see the debugging output of systemd itself (i.e. with log_level set to debug and log_target to something I can read - probably "console" in the case of a container).(Quickly: we had interesting situation within the noisy and cold server room using the emergency.service console such as: $ systemctl start systemd-journald.service --> "unable to comply!" a dependency job for systemd-journald.service failed, see journactl -xn.This is when logging to "kmsg" (the dmesg buffer) or "console" can really help find out the problem.I ended up asking myself 'what part of this puzzle am I missing?', I digged around in Google about systemd and I was stunned by results, I found my concerns were already expressed multiple time with more talented words than mine and this as early as 2010. Since that time it is my understanding systemd continuously try to resolve problems by increasing its complexity and extending its dependencies and its centrality. this is wrong, this is very very wrong. A program as complex as systemd can't be a mandatory PID1 in an open environment as UNIX.From the above paragraphs I get the feeling you may be missing the fact that not all of "systemd" runs in PID1. There are more components in the "systemd" project, such as journald, logind, ... - they run as separate processes. There is some ambiguity when talking about "systemd". Sometimes it refers only to the service manager (PID1), and sometimes to the whole suite.BTW and to go a little bit beyond the systemd case, since 1991, FC18 is the very first distribution I was NOT successful in installing on a plain hardwareI heard F19 was released today with an improved Anaconda :-) Michal
-- A bientôt =========================================================== Jean-Marc Pigeon E-Mail: jmp@xxxxxxx SAFE Inc. Phone: (514) 493-4280 Clement, 'a kiss solution' to get rid of SPAM (at last) Clement' Home base <"http://www.clement.safe.ca"> ===========================================================
<<attachment: smime.p7s>>
-- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel