On Mon, 2004-11-15 at 23:24 +0100, Ziga Mahkovec wrote: > On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote: > > It should be possible to start with a limited set of easily collected > > data and already get a useful picture. Useful data collection could be > > as simple as taking a snapshot of the data that the "top" program > > displays a few times a second during boot. That already gives you a > > list of the running processes, their states, and some statistics about > > global system load. > > So I gave this a try: > > 1. I modified the boot procedure so that early in rc.sysinit, a tmpfs is > mounted and top is run in batch mode (to output every 0.2 seconds). The > logged output is later parsed only up to the point where gdmgreeter is > running and the system is relatively idle (i.e. boot complete and ready > for login). > > 2. A Java program parses the log file, builds the process tree and > finally renders a PNG chart. Processes are sorted by PID and traversed > depth first. > > This still needs more work but here's a sneak preview: > http://www.klika.si/ziga/bootchart/bootchart.png Wow, this is fabulous work, and fast too! What sort of libraries are you using in the Java program? Do you have any idea whether getting it to run on top of open source Java would be feasible? How are you computing the different shades of yellow and gray? Are you looking at differences in the TIME column? > (as a result of http://www.klika.si/ziga/bootchart/bootop.log.gz ) > > Some processes were filtered out for clarity -- mostly sleepy kernel > processes and the ones that only live for the duration of a single top > sample. This skews the chart a bit but is definitely more > comprehensible (compare with > http://www.klika.si/ziga/bootchart/bootchart-complete.png ). > > Some things I plan on adding: > - start logging earlier in the boot process (possibly in initrd), > - add additional layers (e.g. make use of the kernel patch Arjan > suggested for showing the number of open files), > - improve process tree representation and add dependency lines, > - render SVG instead, for scalability and interactivity. All sound good. Grouping processes by the process tree would clearly make things a little clearer, though if you know what's going on, it's not hard to figure out that xkbcomp is being run by X, not by wait_for_sysfs. Disk throughput would probably be the extra layer I'd be most interested in. We seem to be IO bound during almost the entire process, but how are we doing for efficiency? Are we doing significantly better during readahead than during start-random-services? > This definitely helped me with my boot times -- the 4-second load gap at > the start I found to be "modprobe floppy", apparently timing out on my > floppyless laptop :) Just glancing at the initial image certainly brings all sorts of questions to mind: - Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed. - Is it just a coincidence that dhclient gets the lease almost exactly simultaneously with readahead finishing? Is readahead blocking the rest of the system? - Is readahead doing any good at all? Would it still be doing good if we fixed blocking boot for 20 seconds on dhclient? - What does GNOME login look like? You can also see that most of the time is eaten in just a few things ... initial module probing, starting X twice, dhclient. There are only 7 seconds from when dhclient finishes until the point prefdm starts, and that's the portion people have mostly worked on parallelizing. Anyways, I'm very impressed, looks like I'll have to start figuring out shipping to Slovenia :-) (*) Owen (*) Let me know when you think you you are at a point where you think you have something you'd like to have as a poster, and we can work out how to best implement the details of my offer.
Attachment:
signature.asc
Description: This is a digitally signed message part