Re: Boot poster challenge

Kyrre Ness Sjobak <kyrre@xxxxxxxxxxxxxxxxxx> · Sat, 13 Nov 2004 18:29:00 +0100

lør, 13.11.2004 kl. 18.18 skrev Owen Taylor:
> Problem description
> ===================
> 
> Currently, the time to boot the Linux desktop from the point where the
> power switch is turned on, to the point where the user can start doing
> work is roughly two minutes.
> 
> During that time, there are basically three resources being used: the
> hard disk, the CPU, and the natural latency of external systems - the
> time it takes a monitor to respond to a DDC probe, the time it takes
> for the system to get an IP via DCHP, and so forth.
> 
> Ideally, system boot would involve a 3-4 second sequential read of
> around 100 megabytes of data from the hard disk, CPU utilization would
> be parallelized with that, and all queries on external systems would
> be asynchronous ... startup continues and once the external system
> responds, the system state is updated. Plausibly the user could start
> work under 10 seconds on this ideal system.
> 
> The challenge is to create a single poster showing graphically what is
> going on during the boot, what is the utilization of resources, how
> the current boot differs from the ideal world of 100% disk and CPU
> utilization, and thus, where are the opportunities for optimization.
> 
> Graphical Ideas
> ===============
> 
> Presumably, the main display would be a timeline with wall clock time
> on the horizontal (or vertical) axis. Then, you'd have a tree with
> lines representing the processes running at a particular time.
> 
> The processes lines would have attributes indicating state - perhaps
> red when waiting for disk, green when running, dotted when sleeping or
> blocking on IO. Extra lines might be added to the graph to indicate
> dependencies between processes. If a process calls waitpid() on
> another process, a dotted line could be added connecting the end of the
> other process back to the first process. Similar lines could be added
> when a write from one process causes another process that was waiting
> in a read() or select() to wake up.
> 
> While many thousands of processes are run during system boot, this
> doesn't mean the graph has to have vertical space for all of them
> ... vertical space is basically determined by the number of processes
> that are running at once.
> 
> Parallel to the display of processes would be a display of overall CPU
> and disk utilization. CPU utilization on a single processor system is
> pretty straightforward... either the CPU is running at a point in time
> or it isn't. Considerations like memory bandwidth, processor stalls,
> and so forth matter when optimizing particular algorithms but an
> initial guess (that the poster would confirm or deny) is that CPU is
> not a significant bottleneck for system start.
> 
> Disk utilization is more complex, because of the huge cost of seeks;
> while modern drives can easily read 30-40 megabytes/second a seek
> still takes 5-10ms. Whether or not the drive is active tells little
> about how well we are doing using it. In addition, there is a
> significantly long pipeline of requests to the disk, and seeks aren't
> even completely predictable because the drive may reorder read
> requests.
> 
> But a simple display that might be sufficient is graph of
> instantaneous bandwidth (averaged over a small period of time) being
> achieved from the disk drive. If processes are red (waiting on the
> drive) and the bandwidth is low, then there is a problem with too much
> seeking that needs to be addressed.
> 
> You'd also want text in the poster; process names are one obvious
> textual annotation that should be easy to obtain. It might also be
> interested for processes to be able to provide extra annotations; for
> the X server to advertise that it is waiting for a DDC probe, and so
> forth.
> 
> Implementation thoughts
> =======================
> 
> It should be possible to start with a limited set of easily collected
> data and already get a useful picture. Useful data collection could be
> as simple as taking a snapshot of the data that the "top" program
> displays a few times a second during boot. That already gives you a
> list of the running processes, their states, and some statistics about
> global system load.
> 
> Moving beyond that would probably involve instrumenting the kernel to
> give notification of process start and termination (possibly providing
> times(2) style information on termination) to provide visibility for
> processes that run for too short a time to be picked up by
> polling. Better kernel reporting of disk utilization might also be
> needed.
> 
> It might be possible to employ existing tools like oprofile, however,
> the level of detail oprofile provides is really overkill... 
> compressing 2 minutes of runtime involving 1000 processes onto 
> a single poster doesn't really allow worrying about what code
> is getting run by a process at a particular point.
> 
> Obviously, one challenge of any profiling tool is to avoid affecting
> the collected data. Since CPU and memory don't seem to be bottlenecks,
> while disk definitely is a bottleneck, a low impact implementation
> might be a profiling daemon that started early in the boot process and
> accumulated information to be queried and analyzed after the boot
> finishes.
> 
> While producing a single poster would already be enormously useful,
> the ability to recreate the poster on any system at any point would be
> multiply times more so. So, changes to system components that can be
> gotten into the upstream projects and that can be activated at runtime
> rather than needing to be conditionally compiled in are best.
> 
> Motivation
> ==========
> 
> I think this project would be a lot of fun to work on; you'd learn a
> lot about how system boot up works and about performance measurement.
> And beyond that there is a significant design and visualization
> element in figuring out how to display the collected data. It would
> also make a good small-scale academic project.
> 
> But to provide a little extra motivation beyond that, if people pick
> this up and come up with interesting results, I'll (personally) pay
> for up to 3 posters of up to 4' x 6' to be professionally printed and
> laminated. I'll be flexible about how that works ... if multiple
> people collaborate on one design, they can get a copy each of that
> single design.
> 
> - Owen Taylor
> 
> ______________________________________________________________________
> -- 
> fedora-devel-list mailing list
> fedora-devel-list@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/fedora-devel-list

Great to see that things are finally moving in reducing boot time! Good
initiative :)

Kyrre