Re: [PATCH v3 0/3] fsnotify: fix softlockups iterating over d_subdirs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan,

Jan Kara <jack@xxxxxxx> writes:
> Hi Stephen!
>
> On Thu 27-10-22 17:10:13, Stephen Brennan wrote:
>> Here is v3 of the patch series. I've taken all of the feedback,
>> thanks Amir, Christian, Hilf, et al. Differences are noted in each
>> patch.
>> 
>> I caught an obvious and silly dentry reference leak: d_find_any_alias()
>> returns a reference, which I never called dput() on. With that change, I
>> no longer see the rpc_pipefs issue, but I do think I need more testing
>> and thinking through the third patch. Al, I'd love your feedback on that
>> one especially.
>> 
>> Thanks,
>> Stephen
>> 
>> Stephen Brennan (3):
>>   fsnotify: Use d_find_any_alias to get dentry associated with inode
>>   fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem
>>   fsnotify: allow sleepable child flag update
>
> Thanks for the patches Stephen and I'm sorry for replying somewhat late.

Absolutely no worries, these things take time. Thanks for taking a look!

> The first patch is a nobrainer. The other two patches ... complicate things
> somewhat more complicated than I'd like. I guess I can live with them if we
> don't find a better solution but I'd like to discuss a bit more about
> alternatives.

Understood!

> So what would happen if we just clear DCACHE_FSNOTIFY_PARENT_WATCHED in
> __fsnotify_parent() for the dentry which triggered the event and does not
> have watched parent anymore and never bother with full children walk? I
> suppose your contention problems will be gone, we'll just pay the price of
> dget_parent() + fsnotify_inode_watches_children() for each child that
> falsely triggers instead of for only one. Maybe that's not too bad? After
> all any event upto this moment triggered this overhead as well...

This is an interesting idea. It came across my mind but I don't think I
considered it seriously because I assumed that it was too big a change.
But I suppose in the process I created an even bigger change :P

The false positive dget_parent() + fsnotify_inode_watches_children()
shouldn't be too bad. I could see a situation where there's a lot of
random accesses within a directory, where the dget_parent() could cause
some contention over the parent dentry. But to be fair, the performance
would have been the same or worse while fsnotify was active in that
case, and the contention would go away as most of the dentries get their
flags cleared. So I don't think this is a problem.

> Am I missing something?

I think there's one thing missed here. I understand you'd like to get
rid of the extra flag in the connector. But the advantage of the flag is
avoiding duplicate work by saving a bit of state. Suppose that a mark is
added to a connector, which causes fsnotify_inode_watches_children() to
become true. Then, any subsequent call to fsnotify_recalc_mask() must
call __fsnotify_update_child_dentry_flags(), even though the child
dentry flags don't need to be updated: they're already set. For (very)
large directories, this can take a few seconds, which means that we're
doing a few extra seconds of work each time a new mark is added to or
removed from a connector in that case. I can't imagine that's a super
common workload though, and I don't know if my customers do that (my
guess would be no).

For an example of a test workload where this would make a difference:
one of my test cases is to create a directory with millions of negative
dentries, and then run "for i in {1..20}; do inotifywait $DIR & done".
With the series as-is, only the first task needs to do the child flag
update. With your proposed alternative, each task would re-do the
update.

If we were to manage to get __fsnotify_update_child_dentry_flags() to
work correctly, and safely, with the ability to drop the d_lock and
sleep, then I think that wouldn't be too much of a problem, because then
other spinlock users of the directory will get a chance to grab it, and
so we don't risk softlockups. Without the sleepable iterations, it would
be marginally worse than the current solution, but I can't really
comment _how_ much worse because like I said, it doesn't sound like a
frequent usage pattern.

I think I have a _slight_ preference for the current design, but I see
the appeal of simpler code, and your design would still improve things a
lot! Maybe Amir has an opinion too, but of course I'll defer to what you
want.

Thanks,
Stephen



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux