On Mon, Jan 07, 2008 at 03:01:47AM -0700, Eric W. Biederman wrote: > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > What appears to be a clean solution is to have multiple sysfs superblocks > and to capture the namespace at mount time. It is not a clean solution at all. In particular, it leaves you with hell of a coherency issues between these trees. > For planning purposes there > is a device namespace on the drawing board as well, so you can keep > your same major minor numbers for devices (tty names, network attached > disk) in a migration event. Yes, I'm quite sure there's more coming. Which is why I'm asking now, before we are even deeper into that... area > This means netns isn't the only > namespace we will have to worry about with sysfs before it is all > done. Exciting. > > a) what happens if I do chdir("/sys/class/net/eth42/") and then > > migrate? > > It shouldn't be any better or worse then any other filesystem. The > prerequisite for a OS level migration is that the set of all > namespaces and all of the processes that use them all go together. > As we recreate the virtual filesystem and virtual devices we should > recreate a sysfs that is essentially the same. I doubt we will go > to the trouble of keeping the unnamed device number we are mounted on > and the inode numbers the same, but otherwise we should be able to > recreate an identical looking sysfs (baring real hardware changes). Have you even bothered to read the pathname in question? Please, do so. > > c) what happens to open files? E.g. to /sys/class/net - say it, > > if migration happens between two getdents(2). > > How do we restore the internal state? Hmm. The rule is that you > are only guaranteed to see directory entries that existed > both before you started to read the directory and after you finished. > > The cheap solution is just to declared everything hotplugged and > deleted and recreated. Removing any meaningful guarantee of seeing > anything. > > Since we only depend upon the value of f_pos that should largely work. > > If we ever figure out how to preserve inode numbers over a migration > event the current scheme will work unmodified but that sounds like > more pain then it is worth. > Inode numbers? Are you suggesting a wholesale replacement of all struct file referenced by descriptor tables, all way down to inodes? May I see the patches for that, please? > Third when the goal is isolation and not migration (a better chroot) > then our hardware never changes. ... and you have quite a bit of system state (starting with those net:eth0 symlinks, etc.) visible in there, not just the hardware. > The idea is supporting multiple superblocks for sysfs: > > Ultimately capturing the relevant namespace at mount time > and if we don't have a superblock for that namespace creating > a new one. > > So we have one sysfs dirent tree and multiple dentry trees. > > The tricky parts are rename/move and blocking mount/unmount requests > for sysfs until we complete the rename operation calling d_move > everywhere. Excuse me, _what_? Are you seriously suggesting going through all dentry trees, doing d_move() in each? I want to see your locking. It's promising to be worse than devfs had ever been. Much worse. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html