Jan Kara <jack@xxxxxxx> writes: > On Tue 01-11-22 13:48:54, Stephen Brennan wrote: >> Jan Kara <jack@xxxxxxx> writes: >> > Hi Stephen! >> > >> > On Thu 27-10-22 17:10:13, Stephen Brennan wrote: >> >> Here is v3 of the patch series. I've taken all of the feedback, >> >> thanks Amir, Christian, Hilf, et al. Differences are noted in each >> >> patch. >> >> >> >> I caught an obvious and silly dentry reference leak: d_find_any_alias() >> >> returns a reference, which I never called dput() on. With that change, I >> >> no longer see the rpc_pipefs issue, but I do think I need more testing >> >> and thinking through the third patch. Al, I'd love your feedback on that >> >> one especially. >> >> >> >> Thanks, >> >> Stephen >> >> >> >> Stephen Brennan (3): >> >> fsnotify: Use d_find_any_alias to get dentry associated with inode >> >> fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem >> >> fsnotify: allow sleepable child flag update >> > >> > Thanks for the patches Stephen and I'm sorry for replying somewhat late. >> >> Absolutely no worries, these things take time. Thanks for taking a look! >> >> > The first patch is a nobrainer. The other two patches ... complicate things >> > somewhat more complicated than I'd like. I guess I can live with them if we >> > don't find a better solution but I'd like to discuss a bit more about >> > alternatives. >> >> Understood! >> >> > So what would happen if we just clear DCACHE_FSNOTIFY_PARENT_WATCHED in >> > __fsnotify_parent() for the dentry which triggered the event and does not >> > have watched parent anymore and never bother with full children walk? I >> > suppose your contention problems will be gone, we'll just pay the price of >> > dget_parent() + fsnotify_inode_watches_children() for each child that >> > falsely triggers instead of for only one. Maybe that's not too bad? After >> > all any event upto this moment triggered this overhead as well... >> >> This is an interesting idea. It came across my mind but I don't think I >> considered it seriously because I assumed that it was too big a change. >> But I suppose in the process I created an even bigger change :P >> >> The false positive dget_parent() + fsnotify_inode_watches_children() >> shouldn't be too bad. I could see a situation where there's a lot of >> random accesses within a directory, where the dget_parent() could cause >> some contention over the parent dentry. But to be fair, the performance >> would have been the same or worse while fsnotify was active in that >> case, and the contention would go away as most of the dentries get their >> flags cleared. So I don't think this is a problem. >> >> > Am I missing something? >> >> I think there's one thing missed here. I understand you'd like to get >> rid of the extra flag in the connector. But the advantage of the flag is >> avoiding duplicate work by saving a bit of state. Suppose that a mark is >> added to a connector, which causes fsnotify_inode_watches_children() to >> become true. Then, any subsequent call to fsnotify_recalc_mask() must >> call __fsnotify_update_child_dentry_flags(), even though the child >> dentry flags don't need to be updated: they're already set. For (very) >> large directories, this can take a few seconds, which means that we're >> doing a few extra seconds of work each time a new mark is added to or >> removed from a connector in that case. I can't imagine that's a super >> common workload though, and I don't know if my customers do that (my >> guess would be no). > > I understand. This basically matters for fsnotify_recalc_mask(). As a side > note I've realized that your changes to fsnotify_recalc_mask() acquiring > inode->i_rwsem for updating dentry flags in patch 2/3 are problematic for > dnotify because that calls fsnotify_recalc_mask() under a spinlock. > Furthermore it is somewhat worrying also for inotify & fanotify because it > nests inode->i_rwsem inside fsnotify_group->lock however I'm not 100% sure > something doesn't force the ordering the other way around (e.g. the removal > of oneshot mark during modify event generation). Did you run tests with > lockdep enabled? No I didn't. I'll be sure to get the LTP tests running with lockdep early next week and try this series out, we'll probably get an error like you say. I'll also take a look at the dnotify use case and see if there's anything to do there. Hopefully there's something we can do to salvage it :D Thanks, Stephen > Anyway, if the lock ordering issues can be solved, I suppose we can > optimize fsnotify_recalc_mask() like: > > inode_lock(inode); > spin_lock(&conn->lock); > oldmask = inode->i_fsnotify_mask; > __fsnotify_recalc_mask(conn); > newmask = inode->i_fsnotify_mask; > spin_unlock(&conn->lock); > if (watching children changed(oldmask, newmask)) > __fsnotify_update_child_dentry_flags(...) > inode_unlock(inode); > > And because everything is serialized by inode_lock, we don't have to worry > about inode->i_fsnotify_mask and dentry flags getting out of sync or some > mark addition returning before all children are marked for reporting > events. No need for the connector flag AFAICT. > > But the locking issue needs to be resolved first in any case. I need to > think some more... > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR