On Sun 06-04-14 11:00:29, Michael Kerrisk (man-pages) wrote: > On 04/04/2014 02:43 PM, Jan Kara wrote: > > On Fri 04-04-14 09:35:50, Michael Kerrisk (man-pages) wrote: > >> On 04/03/2014 10:52 PM, Jan Kara wrote: > >>> On Thu 03-04-14 08:34:44, Michael Kerrisk (man-pages) wrote: > > [...] > > >>>> Dealing with rename() events > >>>> The IN_MOVED_FROM and IN_MOVED_TO events that are generated by > >>>> rename(2) are usually available as consecutive events when read‐ > >>>> ing from the inotify file descriptor. However, this is not guar‐ > >>>> anteed. If multiple processes are triggering events for moni‐ > >>>> tored objects, then (on rare occasions) an arbitrary number of > >>>> other events may appear between the IN_MOVED_FROM and IN_MOVED_TO > >>>> events. > >>>> > >>>> Matching up the IN_MOVED_FROM and IN_MOVED_TO event pair gener‐ > >>>> ated by rename(2) is thus inherently racy. (Don't forget that if > >>>> an object is renamed outside of a monitored directory, there may > >>>> not even be an IN_MOVED_TO event.) Heuristic approaches (e.g., > >>>> assume the events are always consecutive) can be used to ensure a > >>>> match in most cases, but will inevitably miss some cases, causing > >>>> the application to perceive the IN_MOVED_FROM and IN_MOVED_TO > >>>> events as being unrelated. If watch descriptors are destroyed > >>>> and re-created as a result, then those watch descriptors will be > >>>> inconsistent with the watch descriptors in any pending events. > >>>> (Re-creating the inotify file descriptor and rebuilding the cache > >>>> may be useful to deal with this scenario.) > >>> Well, but there's 'cookie' value meant exactly for matching up > >>> IN_MOVED_FROM and IN_MOVED_TO events. And 'cookie' is guaranteed to be > >>> unique at least within the inotify instance (in fact currently it is unique > >>> within the whole system but I don't think we want to give that promise). > >> > >> Yes, that's already assumed by my discussion above (its described elsewhere > >> in the page). But your comment makes me think I should add a few words to > >> remind the reader of that fact. I'll do that. > > Yes, that would be good. > > > >> But, the point is that even with the cookie, matching the events is > >> nontrivial, since: > >> > >> * There may not even be an IN_MOVED_FROM event > >> * There may be an arbitrary number of other events in between the > >> IN_MOVED_FROM and the IN_MOVED_TO. > >> > >> Therefore, one has to use heuristic approaches such as "allow at least > >> N millisconds" or "check the next N events" to see if there is an > >> IN_MOVED_FROM that matches the IN_MOVED_TO. I can't see any way around > >> that being inherently racy. (It's unfortunate that the kernel can't > >> provide a guarantee that the two events are always consecutive, since > >> that would simply user space's life considerably.) > > > Yeah, it's unpleasant but doing that would be quite costly/complex at the > > kernel side. > > Yep, I imagined that was probably the reason. I had a look into that code again and it's all designed around the fact that there's a single inode to notify. If you liked to have atomic rename notifications, you'd have to rewrite that to work with two inodes, finding out whether these two inodes are actually watched by the same group or not... Doable but complex. Alternatively you could just lock down the whole notification subsystem while generating rename events. But that's rather costly. Just that we have the complications written down somewhere in case someone wants to look into this in future. > > And the race would in the worst case lead to application > > thinking there's been file moved outside of watched area & a file moved > > somewhere else inside the watched area. So the application will have to > > possibly inspect that file. That doesn't seem too bad. > > It's actually very bad. See the text above. The point is that one likely > treatment on an IN_MOVED_FROM event that has no IN_MOVED_TO is to remove > the watches for the moved out subtree. If it turns out that this really > was just a rename(), then on the IN_MOVED_TO, the watches will be recreated > *with different watch descriptors*, thus invalidating the watch descriptors > in any queued but as yet unprocessed inotify events. See what I mean? > That's quite painful for user space. But if I understand it right, you loose only the information for recreated watches. So you effectively loose all the information about what has happened inside the subtree of moved directory (or what has happened with the moved file). But since you think it's a file / dir moved from outside of watched area, you have to fully rescan that file / dir anyway. Sure that's costly but if your heuristics for detecting rename works 99.9% of time it should be OK, shouldn't it? And you have to have that code handling caching file / dir written anyway for handling real moves from outside of watched hierarchy. Don't get me wrong, I understand it would be easier for userspace to get atomic rename notifications, I'm just trying to understand what exactly is painful so that I can compare the cost at the kernel side with the cost at the userspace side... Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html