On Tue, Jun 6, 2023 at 3:30 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote: > > > On 6/6/23 18:21, Ilya Dryomov wrote: > > On Tue, Jun 6, 2023 at 2:55 AM <xiubli@xxxxxxxxxx> wrote: > >> From: Xiubo Li <xiubli@xxxxxxxxxx> > >> > >> There is a race between capsnaps flush and removing the inode from > >> 'mdsc->snap_flush_list' list: > >> > >> == Thread A == == Thread B == > >> ceph_queue_cap_snap() > >> -> allocate 'capsnapA' > >> ->ihold('&ci->vfs_inode') > >> ->add 'capsnapA' to 'ci->i_cap_snaps' > >> ->add 'ci' to 'mdsc->snap_flush_list' > >> ... > >> == Thread C == > >> ceph_flush_snaps() > >> ->__ceph_flush_snaps() > >> ->__send_flush_snap() > >> handle_cap_flushsnap_ack() > >> ->iput('&ci->vfs_inode') > >> this also will release 'ci' > >> ... > >> == Thread D == > >> ceph_handle_snap() > >> ->flush_snaps() > >> ->iterate 'mdsc->snap_flush_list' > >> ->get the stale 'ci' > >> ->remove 'ci' from ->ihold(&ci->vfs_inode) this > >> 'mdsc->snap_flush_list' will WARNING > >> > >> To fix this we will increase the inode's i_count ref when adding 'ci' > >> to the 'mdsc->snap_flush_list' list. > >> > >> Cc: stable@xxxxxxxxxxxxxxx > >> URL: https://bugzilla.redhat.com/show_bug.cgi?id=2209299 > >> Reviewed-by: Milind Changire <mchangir@xxxxxxxxxx> > >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > >> --- > >> > >> V3: > >> - Fix two minor typo in commit comments. > >> > >> > >> > >> fs/ceph/caps.c | 6 ++++++ > >> fs/ceph/snap.c | 4 +++- > >> 2 files changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > >> index feabf4cc0c4f..7c2cb813aba4 100644 > >> --- a/fs/ceph/caps.c > >> +++ b/fs/ceph/caps.c > >> @@ -1684,6 +1684,7 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, > >> struct inode *inode = &ci->netfs.inode; > >> struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; > >> struct ceph_mds_session *session = NULL; > >> + int put = 0; > > Hi Xiubo, > > > > Nit: renaming this variable to need_put and making it a bool would > > communicate the intent better. > > Hi Ilya > > Sure, will update it. > > >> int mds; > >> > >> dout("ceph_flush_snaps %p\n", inode); > >> @@ -1728,8 +1729,13 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, > >> ceph_put_mds_session(session); > >> /* we flushed them all; remove this inode from the queue */ > >> spin_lock(&mdsc->snap_flush_lock); > >> + if (!list_empty(&ci->i_snap_flush_item)) > >> + put++; > > What are the cases when ci is expected to not be on snap_flush_list > > list (and therefore there is no corresponding reference to put)? > > > > The reason I'm asking is that ceph_flush_snaps() is called from two > > other places directly (i.e. without iterating snap_flush_list list) and > > then __ceph_flush_snaps() is called from two yet other places. The > > problem that we are presented with here is that __ceph_flush_snaps() > > effectively consumes a reference on ci. Is ci protected from being > > freed by handle_cap_flushsnap_ack() very soon after __send_flush_snap() > > returns in all these other places? > > There are 4 places will call the 'ceph_flush_snaps()': > > Cscope tag: ceph_flush_snaps > # line filename / context / line > 1 3221 fs/ceph/caps.c <<__ceph_put_cap_refs>> > ceph_flush_snaps(ci, NULL); > 2 3336 fs/ceph/caps.c <<ceph_put_wrbuffer_cap_refs>> > ceph_flush_snaps(ci, NULL); > 3 2243 fs/ceph/inode.c <<ceph_inode_work>> > ceph_flush_snaps(ci, NULL); > 4 941 fs/ceph/snap.c <<flush_snaps>> > ceph_flush_snaps(ci, &session); > Type number and <Enter> (q or empty cancels): > > For #1 it will add the 'ci' to the 'mdsc->snap_flush_list' list by > calling '__ceph_finish_cap_snap()' and then call the > 'ceph_flush_snaps()' directly or defer call it in the queue work in #3. > > The #3 is the reason why we need the 'mdsc->snap_flush_list' list. > > For #2 it won't add the 'ci' to the list because it will always call the > 'ceph_flush_snaps()' directly. > > For #4 it will call 'ceph_flush_snaps()' by iterating the > 'mdsc->snap_flush_list' list just before the #3 being triggered. > > The problem only exists in case of #1 --> #4, which will make the stale > 'ci' to be held in the 'mdsc->snap_flush_list' list after 'capsnap' and > 'ci' being freed. All the other cases are okay because the 'ci' will be > protected by increasing the ref when allocating the 'capsnap' and will > decrease the ref in 'handle_cap_flushsnap_ack()' when freeing the 'capsnap'. > > Note: the '__ceph_flush_snaps()' won't increase the ref. The > 'handle_cap_flushsnap_ack()' will just try to decrease the ref and only > in case the ref reaches to '0' will the 'ci' be freed. So my question is: are all __ceph_flush_snaps() callers guaranteed to hold an extra (i.e. one that is not tied to capsnap) reference on ci so that when handle_cap_flushsnap_ack() drops one that is tied to capsnap the reference count doesn't reach 0? It sounds like you are confident that there is no issue with ceph_flush_snaps() callers, but it would be nice if you could confirm the same for bare __ceph_flush_snaps() call sites in caps.c. Thanks, Ilya