Re: [PATCH -V3] Generic name to handle and open by handle syscalls

Neil Brown <neilb@xxxxxxx> · Sat, 24 Apr 2010 11:08:12 +1000

On Fri, 23 Apr 2010 18:19:59 -0600
Andreas Dilger <adilger@xxxxxxx> wrote:

> On 2010-04-23, at 07:23, Theodore Tso wrote:
> > 
> > Something to consider is whether anything bad happens if there are multiple filesystems mounted with the same UUID.  I can think of two ways this could happen.   One is when we make a read-only LVM snapshot of a filesystem, and then mount it to back up a stable snapshot.  This might happen if the sysadmin is trying to backup a SQL database, for example; the database gets frozen, we take a snapshot, and then we unfreeze the database and mount the snapshot.   Now suppose we try to open-by-handle the mysql database --- should the system return the a file from the r/o frozen snapshot, or from the r/w file system?
> 
> I'd say from the r/w LV in virtually all cases.  We are safe from totally egregious errors, because the inode+generation will prevent totally incorrect files from being returned, but newer/older versions of the same file/director may be found.
> 
> > Something we might do is to add a check and refuse mounting file systems with duplicate UUID's, and changing the LVM snapshot code to do run some kind of hook after a snapshot which runs a "tune2fs -U random" on the snapshot.   For r/o LVM snapshots, we could also put in a hack that if there are two file systems mounted, one r/o and one r/w, we return the r/w file system.
> 
> I think this may break things if we change the UUID when a snapshot is created, because we don't know what userspace might be using the UUID for.  That said, I totally agree that returning the r/w LV makes sense.  The LVM code itself understands which LV is the primary and which is the snapshot, so it likely means that the "lookup the UUID" code might need to be smarter.
> 
> Probably the simplest thing is if a new filesystem is mounted, but a second filesystem with the same UUID is mounted that it is skipped.  If we keep the UUID list in FIFO order, that should be sufficient to ensure that the "primary" version is returned first.
> 

I really think this sounds too much like 'policy'.  It is not a trivially
obvious algorithm for selecting the 'right' filesystem.  It depends on the
order things have happened, which might be right for the case that you are
thinking of, but might be wrong for some other case.

I haven't been following the conversation closely so I might have missed
something, but why don't we leave the mapping from handle->filesystem up to
userspace and just do the "filesystem+handle -> file" part in the kernel?
(i.e. just what nfsd does).

>From the kernel's perspective, the only unique identifier for a file system
is a (sometimes fictitious or arbitrary) device number.  Using anything else
(except maybe a mount point) in a kernel interface just seems wrong.

Maybe map the filesystem part of the handle from UUID (or whatever) to devno
in userspace, then pass the devno+file-part-of-handle to the kernel to
perform, the final mapping.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html