Jeff Layton <jlayton@xxxxxxxxxx> writes: > On Wed, 2020-11-11 at 11:08 +0000, Luis Henriques wrote: >> Jeff Layton <jlayton@xxxxxxxxxx> writes: >> >> > On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: >> > > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: >> > > > I believe it's possible that we could end up with racing calls to >> > > > __ceph_remove_cap for the same cap. If that happens, the cap->ci >> > > > pointer will be zereoed out and we can hit a NULL pointer dereference. >> > > > >> > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL, >> > > > and just return without doing anything if it is. >> > > > >> > > > URL: https://tracker.ceph.com/issues/43272 >> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> >> > > > --- >> > > > fs/ceph/caps.c | 21 ++++++++++++++++----- >> > > > 1 file changed, 16 insertions(+), 5 deletions(-) >> > > > >> > > > This is the only scenario that made sense to me in light of Ilya's >> > > > analysis on the tracker above. I could be off here though -- the locking >> > > > around this code is horrifically complex, and I could be missing what >> > > > should guard against this scenario. >> > > > >> > > >> > > I think the simpler fix is, in trim_caps_cb, check if cap-ci is >> > > non-null before calling __ceph_remove_cap(). this should work because >> > > __ceph_remove_cap() is always called inside i_ceph_lock >> > > >> > >> > Is that sufficient though? The stack trace in the bug shows it being >> > called by ceph_trim_caps, but I think we could hit the same problem with >> > other __ceph_remove_cap callers, if they happen to race in at the right >> > time. >> >> Sorry for resurrecting this old thread, but we just got a report with this >> issue on a kernel that includes commit d6e47819721a ("ceph: hold >> i_ceph_lock when removing caps for freeing inode"). >> >> Looking at the code, I believe Zheng's suggestion should work as I don't >> see any __ceph_remove_cap callers that don't hold the i_ceph_lock. So, >> would something like the diff bellow be acceptable? >> >> Cheers, > > I'm still not convinced that's the correct fix. > > Why would trim_caps_cb be subject to this race when other > __ceph_remove_cap callers are not? Maybe the right fix is to test for a > NULL cap->ci in __ceph_remove_cap and just return early if it is? I see, you're probably right. Looking again at the code I see that there are two possible places where this race could occur, and they're both used as callbacks in ceph_iterate_session_caps. They are trim_caps_cb and remove_session_caps_cb. These callbacks get the struct ceph_cap as argument and only then they will acquire i_ceph_lock. Since this isn't protected with session->s_cap_lock, I guess this could be where the race window is, where cap->ci can be set to NULL. Bellow is the patch you suggested. If you think that's acceptable I can resend with a proper commit message. Cheers, -- Luis diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index ded4229c314a..917dfaf0bd01 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1140,12 +1140,17 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) { struct ceph_mds_session *session = cap->session; struct ceph_inode_info *ci = cap->ci; - struct ceph_mds_client *mdsc = - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; + struct ceph_mds_client *mdsc; + int removed = 0; + if (!ci) + return; + dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; + /* remove from inode's cap rbtree, and clear auth cap */ rb_erase(&cap->ci_node, &ci->i_caps); if (ci->i_auth_cap == cap) {