Hello Ian, On 28/7/2023 10:16 am, Ian Kent wrote: > On 28/7/23 08:00, Ian Kent wrote: >> On 27/7/23 12:30, Imran Khan wrote: >>> Hello Ian, >>> Sorry for late reply. I was about to reply this week. >>> >>> On 27/7/2023 10:38 am, Ian Kent wrote: >>>> On 20/7/23 10:03, Ian Kent wrote: >>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: >>> [...] >>>>> I do see a problem with recent changes. >>>>> >>>>> I'll send this off to Greg after I've done some testing (primarily just >>>>> compile and function). >>>>> >>>>> Here's a patch which describes what I found. >>>>> >>>>> Comments are, of course, welcome, ;) >>>> Anders I was hoping you would check if/what lockdep trace >>>> >>>> you get with this patch. >>>> >>>> >>>> Imran, I was hoping you would comment on my change as it >>>> >>>> relates to the kernfs_iattr_rwsem changes. >>>> >>>> >>>> Ian >>>> >>>>> kernfs: fix missing kernfs_iattr_rwsem locking >>>>> >>>>> From: Ian Kent <raven@xxxxxxxxxx> >>>>> >>>>> When the kernfs_iattr_rwsem was introduced a case was missed. >>>>> >>>>> The update of the kernfs directory node child count was also protected >>>>> by the kernfs_rwsem and needs to be included in the change so that the >>>>> child count (and so the inode n_link attribute) does not change while >>>>> holding the rwsem for read. >>>>> >>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and >>> these are getting invoked while adding (kernfs_add_one), >>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these >>> operations proceed under kernfs_rwsem and I see each invocation of >>> kernfs_link/unlink_sibling during the above mentioned operations is happening >>> under kernfs_rwsem. >>> So the child count should still be protected by kernfs_rwsem and we should not >>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. >> >> Yes, that's exactly what I intended (assuming you mean write lock in those cases) >> >> when I did it so now I wonder what I saw that lead to my patch, I'll need to look >> >> again ... > > Ahh, I see why I thought this ... > > It's the hunk: > > @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap, > kn = inode->i_private; > root = kernfs_root(kn); > > - down_read(&root->kernfs_rwsem); > + down_read(&root->kernfs_iattr_rwsem); > kernfs_refresh_inode(kn, inode); > ret = generic_permission(&nop_mnt_idmap, inode, mask); > - up_read(&root->kernfs_rwsem); > + up_read(&root->kernfs_iattr_rwsem); > > return ret; > } > > which takes away the kernfs_rwsem and introduces the possibility of > > the change. It may be more instructive to add back taking the read > > lock of kernfs_rwsem in .permission() than altering the sibling link > > and unlink functions, I mean I even caught myself on it. > Yes this was the block I referred to in my second comment [1]. However adding back read lock of kernfs_rwsem in .permission() will again introduce the bottleneck that I mentioned at [2]. So I think taking taking the locks in link/unlink is the better option. I understand having one lock to synchronize everything makes it easier debug/development wise but sometimes (such as the case mentioned in [2]), it is not optimum performance wise. Thoughts ? Thanks, Imran [1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@xxxxxxxxxx/ [2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@xxxxxxxxxx/ > > Ian > >> >> >>> >>> Kindly let me know your thoughts. I would still like to see new lockdep traces >>> with this change. >> >> Indeed, I hope Anders can find time to get the trace. >> >> >> Ian >> >>> >>> Thanks, >>> Imran >>> >>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>>>> attributes) >>>>> >>>>> Signed-off-by: Ian Kent <raven@xxxxxxxxxx> >>>>> Cc: Anders Roxell <anders.roxell@xxxxxxxxxx> >>>>> Cc: Imran Khan <imran.f.khan@xxxxxxxxxx> >>>>> Cc: Arnd Bergmann <arnd@xxxxxxxx> >>>>> Cc: Minchan Kim <minchan@xxxxxxxxxx> >>>>> Cc: Eric Sandeen <sandeen@xxxxxxxxxxx> >>>>> --- >>>>> fs/kernfs/dir.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>>>> index 45b6919903e6..6e84bb69602e 100644 >>>>> --- a/fs/kernfs/dir.c >>>>> +++ b/fs/kernfs/dir.c >>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>>>> *kn) >>>>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>>>> /* successfully added, account subdir number */ >>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>> kn->parent->dir.subdirs++; >>>>> kernfs_inc_rev(kn->parent); >>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> return 0; >>>>> } >>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>>>> kernfs_node *kn) >>>>> if (RB_EMPTY_NODE(&kn->rb)) >>>>> return false; >>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>> kn->parent->dir.subdirs--; >>>>> kernfs_inc_rev(kn->parent); >>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> rb_erase(&kn->rb, &kn->parent->dir.children); >>>>> RB_CLEAR_NODE(&kn->rb); >>>>>