Re: A fuse based initfs

Douglas McClendon <dmc.fedora@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 22 Aug 2007 21:20:25 -0500

Jon Nettleton wrote:
This all started with a some what simple task.  I wanted to start rhgb/gdm
as early in the boot process as possible.  Basically kernel->disk->gui.  How
hard could it be?  Well not fun really.  My finally solution, which is
unacceptable for fedora right now is patching the kernel with unionfs and
using that as an overlay for /var and /tmp.  That gave me the transparent
filesystem overlay I needed to be able to start up a nice gui and allow
things like fsck to happen underneath without disturbing things.  Even with
this solution I still don't have init restarting gdm if it dies.

So I thought, and discarded and thought some more.  Now I want anyone
willing to comment on my thoughts.

Using unionfs for your rootfs sounds like a currently bad, but 
ultimately very desirable feature.  I would love to have my rootfs have 
a dozen different layers, some of them coming from peer2peer distributed 
filesystems.

But at the moment, using the current unionfs and/or fuse for the rootfs 
seems like a bad idea.  Admittedly, all I can do is wave my hands and 
point to vague memories of lots of issues, but I suspect the issues are 
real.

In general, it sounds like you are outlining several different 
problem/solutions, but I don't think they all need to be as tightly 
integrated as you suggest.

For instance, the gdm in initramfs (or very very early).  Why do you 
need this copy-on-write rootfs stuff?  Why not just have a tmpfs, and a 
gdm configuration that looks there.  Likewise for the early logging 
stuff.  Later during boot, the early-boot logfiles in tmpfs can be 
copied to /var/log.  This isn't as nice as the magic 
unionfs/dm-snapshot-merge automagic merging.  But I don't think that is 
necessary and worth the steps you are taking to get it.

On the issue of not starting services that might not need to be 
(bluetooth, networking, smartcards, ...), I think in various fedora 
wikis there is talk of DBus as a solution to that problem.

I think there are some flaws with your strategy (specifically wrt 
unionfs).  I.e. in (4) you mention flushing the fs down to lower layers 
and disappearing.  Can you actually make the unionfs disappear?  Aren't 
there some obscure limitations of unionfs (even when only containing a 
single layer) that will make it unpalatable for the general case?  (I 
have vague memories of something called sendfile and apache, and some 
types of symlinks).

Of course if you are talking about devicemapper snapshot merging ala 
markmc's patchset, then I am all for it (just because I have other plans 
for that functionality).  But then I am also confused about using both 
that and unionfs and what exactly you are using for what.

But I definitely get the feeling that you are moving in the right 
general direction towards things that I agree should be improved.

-dmc

My proposal is a user-land based filesystem that is specifically built to
work with sysvinit to give it more functionality without changing it.  You
want a standard sysvinit Unix boot just don't pass a parameter to the kernel
commandline, no problemo.  However, with it enabled you would
"theoretically" get the following.

1)  Basic cached ram overlay.  This could possibly be used to replace our
readahead scripts for disk caching.  The more immediate need is a temporary
ram file-system to allow system processes to write logs, status, pipes to
before we have had a chance to verify disk integrity.  This should get us
the ability to provide nice X based gui tools for first boot, system
recovery, and possibly encryption unlocking.

2)  Better init logging.  With /var writable ( at-least in ram ) we can
start syslog nice and early.

Just those two things give us a nicer gui boot screen and possibly cut the
time of launching X twice off our boot sequence.  Now we go one step
further.

3)  We use the abstraction layer to manipulate the startup scripts that init
sees in /etc/rcX.d .
     This would require
     A)  Netlink support.  Do we or don't we have a network interface.  If
we don't then automatically remove all network
          dependant services from init.  If Network comes up later in the
process and init is still running ( we know that
          because we can keep track of /var/lock/subsys ) the filesystem
re-adds them later in the process.
     B)  General dependencies.  Like I mentioned we can keep track of what
has started using /var/lock/subsys or
          /var/run.  If   Something fails remove the dependent scripts out
of the way so init doesn't try to start them.
     C)  Ability to maximize IO throughput.  Well this is just a thought.
Right now we see one of the major bottlenecks
          in our init process as overloading the IO subsystem.  With an
intelligent read only overlay we could do basic
          metrics and possibly wait a second longer to start the next
process knowing it will shorten the time to launch
          the next service by 2 seconds.  I have no proof this will work,
but after looking at those bootchart graphs
          enough some crazy ideas cross your mind.

4)  After the init process is done, the filesystem flushes itself to the
lower layer writables disks and disappears.

First, sorry if this is wrapped horribly.  I am using gmail and it doesn't
lend itself to formatting long mails like this.
Second,  Let's talk about it.  Like I said this just came to my mind as
something that doesn't exist, and might possibly help us build a better
system around what we already have.

Jon

--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list