Re: requirements for blkid for possible replacement of volume_id in udev

Kay Sievers <kay.sievers@xxxxxxxx> · Fri, 21 Nov 2008 14:10:15 +0100

On Fri, 2008-11-21 at 12:04 +0100, Karel Zak wrote:
> On Fri, Nov 21, 2008 at 10:47:48AM +0100, Kay Sievers wrote:
> > udev:
> > =====
> > We would need something like this, where the output of some characters
> > is hex-encoded if needed. The mount integration of libblkid will need
> > the encoding code anyway to lookup the udev created label/uuid links, so
> > it might be good to just add it to the current binary:
> >   $ blkid --export /dev/sda5
> >   BLKID_USAGE=filesystem
> >   BLKID_TYPE=ext3
> >   BLKID_VERSION=1.0
> >   BLKID_UUID_ENC=ea758d5a-d93d-4899-bec7-abf553a6f16c
> >   BLKID_LABEL_ENC=\x2f
> > 
> > If that's not feasible, we can just keep the vol_id binary in udev, and
> > link it against util-linux-ng's libblkid.
> 
>  I think sometimes people need to convert  the (hex-encoded) names to
>  the real UUIDs/LABELs. It makes sense to support this feature in the
>  library to avoid multiple implementations.
> 
>  My wish is to add "--lowprobe" and "--format={export,...}" to blkid
>  binary.

Something like that sounds fine.

>  (Maybe we can also check argv[0] and enable this options automatically
>  when the name of the binary is "vol_id". So you can for backward
>  compatibility use a symlink /lib/udev/vol_id --> /bin/blkid. Depends
>  on your POV.)

That's why I moved it from /sbin/ to /lib/udev/ long ago. Maybe we
should just log an error when its called, and then get rid of it after a
while.

> > Based on past experience with blkid, I require a formal statement,
> > mentioned in the util-linux-ng blkid documentation, or just in the
> > source code, that it will always allow raw byte-stream probing if asked
> > for. :)
> 
>  Ah man :-) It's completely separated API, so ...

I just need to be sure, that the "smartness" does not come through the
backdoor after a while. I have my reasons to be very careful here. :)

> > HAL:
> > ====
> > 
> > After udev and HAL are converted, which would happen at the same time, I
> > will remove libvolume_id from the udev tree, and the fsprobe configure
> > option in util-linux-ng can be removed. We will only have a single
> > library.
> 
>  I'd like to provide at least one "overlapping release" -- it means
>  release when is possible to compile against old and new libraries.
>  Some people are very very conservative :-)
> 
>  Ted's plan is to keep the old freezed libblkid in e2fsprogs.

Sounds fine. As long as we remove the useless options some day.

I will definitely delete the volume_id library immediately after we
successfully switched over. :)

> > mount:
> > ======
> > If the current mount is compiled to use libblkid instead of
> > libvolume_id, it does not use udev supplied information. To find a label
> > or uuid of a volume, it opens in-process every device sequentially,
> 
>  This is not true. The library has a cache, so if the device is
>  already cached it reads and verify the single device only. This
>  happen almost all time.

The cache can by definition not contain results of unknown devices. And
you may have unknown devices all the time. Otherwise we would not need
all the fancy probing in the first place. :)

The cache also uses kernel device names/numbers to look up stuff, this
just can not work with current systems. Also, extended minors for sd*
devices just hit the kernel, and the assigned extended number is
*random*. So much for this kind of cache. :)

I'm not talking about lookup efficiency, I talk about the behavior that
the whole system may stop working if some disks, which you may not even
need, don't behave as expected. It's a serious bug.

Just try the mentioned command with mount and any non-cached device ID
at any Fedora system, and you will wait for 2 hours for mount to return.
The cache does not help anything here, if you are not pure lucky that it
already matches.

> > which may take a long time, if some of the available block devices do
> > not behave normally. Udev does this information gathering at device
> > discovery, asynchronously with a thread per device, which does not have
> > this problem.
> 
>  ... udev is gathering when the information is available, not all
>  mkfs/tunefs programs inform udev about UUID/LABEL changes. You
>  know...
> 
>  IMHO the proper udev based mount(8) implementation should be:
> 
>     - read disk-by symlink
>     - *verify* that the link match with LABEL/UUID on the device
>     - if the link *does not match*, call "the smart" libblkid API ;-)

And that is what we need to have configurable. Most udev setups don't
need it, and some just break with the "smart" behavior. There is
absolutely no way to enable this unconditionally.

>  I'd like to try to implement on inotify based blkidd, we will see how
>  this thing help us.

Sounds great, if that can work as expected.

> > If we replace libvolume_id, we (we as the current libvolume_id in mount
> > users) need to be able to specify a different default behavior for
> > mount, which will by default never try to open any other device. We just
> > want to depend on udev maintained data. It can be a config file option
> > somewhere, or a compile-time option, both should work fine.
> 
>  It seems we need a config file for more things.

Yeah.

>  In long-term point of view we need to found *one* reliable way how to
>  work with UULDs/LABELs and use it in all distros. The scenario also
>  need fallback for non-udev based systems.

Sure, that makes perfect sense to keep non-udev setups working. Besides
custom setups, it's also always useful for rescue systems and similar.

As said, that fallback must not happen if not explicitly asked for it.
Some systems can not afford to open _any_ other device and possibly hang
until it reaches a timeout.

>  BTW, how do you resolve UUID/LABEL when you mount a root filesystem in
>  initrd?

Udev runs in initramfs, and creates the usual symlinks there just like
in the real rootfs. You can just use all the /dev/disk/by-* links, and
mount by label, uuid, physical path, hardware id.

For special root= parameters, initramfs creates a temporary udev rule,
which can match on anything ENV{ID_FS_LABEL}, ENV{MAJOR}, ENV{MINOR}.
Then you load the modules, the device matches the rule and creates
a /dev/root symlink.

After loading the modules, initramfs just waits for the specified device
node/link or /dev/root to be asynchronously created, and mounts the
rootfs when it appears.

It's all the same logic as in the real rootfs, nothing really special
here, no special setup, or any special binary.

Thanks,
Kay

--
To unsubscribe from this list: send the line "unsubscribe util-linux-ng" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html