"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > On 07/28/2016 02:56 PM, Eric W. Biederman wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >> >>> On 07/26/2016 10:39 PM, Andrew Vagin wrote: >>>> On Tue, Jul 26, 2016 at 09:17:31PM +0200, Michael Kerrisk (man-pages) wrote: >> >>>> If we want to compare two file descriptors of the current process, >>>> it is one of cases for which kcmp can be used. We can call kcmp to >>>> compare two namespaces which are opened in other processes. >>> >>> Is there really a use case there? I assume we're talking about the >>> scenario where a process in one namespace opens a /proc/PID/ns/* >>> file descriptor and passes that FD to another process via a UNIX >>> domain socket. Is that correct? >>> >>> So, supposing that we want to build a map of the relationships >>> between namespaces using the proposed kcmp() API, and there are >>> say N namespaces? Does this mena we make (N * (N-1) / 2) calls >>> to kcmp()? >> >> Potentially. The numbers are small enough O(N^2) isn't fatal. > > Define "small", please. > > O(N^2) makes me nervous about what other use cases lurk out > there that may get bitten by this. Worst case for N (One namespace per thread) is about 60k. A typical heavy use case may be 1000 namespaces of any type. So we are talking about O(N^2) that rarely happens and should be done in a couple of seconds. >> Where kcmp shines is that it allows migration to happen. Inode numbers >> to change (which they very much will today), and still have things work. > > >> We can keep it O(Nlog(N)) by taking advantage of not just the equality >> but the ordering relationship. Although Ugh. > > Yes, that sounds pretty ugly... Actually having thought about this a little more if kcmp returns an ordering by inode and migration preserves the relative order of the inodes (which should just be a creation order) it should be quite solvable. Switch from an order by inode number to an order by object creation time, and guarantee that all creations are have an order (which with task_list_lock we practically already have) and it should be even easier to create. (A 64bit nanosecond resolution timestamp is good for 544 years of uptime). A 64bit number that increments each time an object is created should have an even better lifespan. I don't know if we can find a way to give that guarantee for other kcmp comparisons but it is worth a thought. >>One disadvantage of >> kcmp currently is that the way the ordering relationship is defined >> the order is not preserved over migration :( > > So, does kcmp() fully solve the proble(s) at hand? It sounds like > not, if I understand your last point correctly. There are 3 possibilities I see for migration in migration, ordered in order of implementation difficulty. 1) Have a clear signal that migration happened and a nested migration needs to restart. 2) Use kcmp so that only the relative order needs to be preserved. 3) Preserve the device number and inode numbers. At a practical level I think (2) may actually in net be the simplest. It requires a little more care to implement and you have to opt in, but it should not require any rolling back of activity (merely careful ordering of object creation). I definititely like kcmp knowing how to compare things by inode (aka st_dev, st_inode) because then even if you have to restart the comparisons after a migration the exact details you are comparing are hidden and so it is easier to support and harder to get wrong. I can imagine how to preserve inode numbers by creating a new instance of nsfs instance and using the old inode numbers upon restore. I don't currently see how we could possibly preserve st_dev over migration short of a device number namespace. So if we are going to continue with making device numbers be a legacy attribute applications should not care about we need a way to compare things by not looking at st_dev. Which brings us back to kcmp. Hmm. Hotplugging as disk and plugging it back likely will change the device number and give the same kind of challenge with st_dev (although you can't keep a file descriptor open across that kind of event). So certainly a hotplug event on a device should be enough to say don't care about the device number. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html