Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > As much as I hate to touch either subject, let alone both at > once... Eric, would you mind explaining what exactly do you want > sysfs to do in presense of your "namespaces"? On the "what does user > see if we do <...>" level. Right. I need to repost the patches since Greg didn't get them applied last time. What appears to be a clean solution is to have multiple sysfs superblocks and to capture the namespace at mount time. For planning purposes there is a device namespace on the drawing board as well, so you can keep your same major minor numbers for devices (tty names, network attached disk) in a migration event. This means netns isn't the only namespace we will have to worry about with sysfs before it is all done. > a) what happens if I do chdir("/sys/class/net/eth42/") and then > migrate? It shouldn't be any better or worse then any other filesystem. The prerequisite for a OS level migration is that the set of all namespaces and all of the processes that use them all go together. As we recreate the virtual filesystem and virtual devices we should recreate a sysfs that is essentially the same. I doubt we will go to the trouble of keeping the unnamed device number we are mounted on and the inode numbers the same, but otherwise we should be able to recreate an identical looking sysfs (baring real hardware changes). > b) what happens to /sys/class/net/eth0/device visibility/things > it points to/etc.? That should continue to work without any changes at all. We only play with /sys/class/net (and it's cousin directories that only exist when we don't enable sysfs backwards compatibility). The symlink might change but that is about it. > c) what happens to open files? E.g. to /sys/class/net - say it, > if migration happens between two getdents(2). How do we restore the internal state? Hmm. The rule is that you are only guaranteed to see directory entries that existed both before you started to read the directory and after you finished. The cheap solution is just to declared everything hotplugged and deleted and recreated. Removing any meaningful guarantee of seeing anything. Since we only depend upon the value of f_pos that should largely work. If we ever figure out how to preserve inode numbers over a migration event the current scheme will work unmodified but that sounds like more pain then it is worth. > d) what happens to visibility in other parts of sysfs? E.g. to > things like > $ ls /sys/devices/pci0000\:00/0000\:00\:0a.0/ > bus device local_cpus power resource1 uevent > class driver modalias resource subsystem_device vendor > config irq net:eth0 resource0 subsystem_vendor It all shows up. Nothing is hidden except for the directories and possibly the symlinks to the directories for network devices. We aren't trying to virtualize the hardware. > $ > See that net:eth0 in there? Are all such suckers seen? Yep. Grr. net:eth0 from another namespace should either be a broken symlink or disappear completely. It has been ages since I looked at what my patches do in that case, it should be just a broken symlink. This is a big of a challenge to explain because the relevant directory structure changes in sysfs when CONFIG_SYSFS_DEPRECATED=n. Then instead of net:eth0 we have net/eth0 and the all of the device specific files there. > e) while we are at it, wouldn't seeing the information in > /sys/devices/pci in general defeat whatever purpose you have in mind > for your stuff? No. First when you migrate or whatever you can report all of the hardware in the machine was hot unplugged and a new set of essentially identical hardware was hotplugged. For stuff that goes through an OS abstraction like a fs they don't care. For stuff that talks to the hardware directly you don't have a choice you have to make user space deal with it. However the set of applications that care is actually quite rare. Secondly the goal is not to hide the fact you are running in a set namespace that don't cover the entire machine, but to make it so that you don't care. Which is close but not quite the same thing. Third when the goal is isolation and not migration (a better chroot) then our hardware never changes. > Context: we need sane locking for sysfs. I think I have a more or less > workable scheme, but its feasibility depends big way on what netns needs > to have. I think on the netns side Tejun and I have hashed it over enough that the semantics if not the implementation comes out cleanly. The idea is supporting multiple superblocks for sysfs: Ultimately capturing the relevant namespace at mount time and if we don't have a superblock for that namespace creating a new one. So we have one sysfs dirent tree and multiple dentry trees. The tricky parts are rename/move and blocking mount/unmount requests for sysfs until we complete the rename operation calling d_move everywhere. Essentially the dentry and sysfs dirent separation was the big part I needed. If all I had to deal with was /sys/class/net I think I would have split that off into it's own filesystem. However with the latest sysfs layout we are far beyond that and there are symlinks going all over tying all of the pieces together. Eric - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html