On Sat, 2020-02-29 at 10:34 +0800, Yan, Zheng wrote: > On 2/29/20 5:39 AM, Jeff Layton wrote: > > Hi Zheng, > > > > I've been going over the cap handling code, and noticed some sketchy > > looking locking in __kick_flushing_caps that was added by this patch: > > > > > > commit e4500b5e35c213e0f97be7cb69328c0877203a79 > > Author: Yan, Zheng <zyan@xxxxxxxxxx> > > Date: Wed Jul 6 11:12:56 2016 +0800 > > > > ceph: use list instead of rbtree to track cap flushes > > > > We don't have requirement of searching cap flush by TID. In most cases, > > we just need to know TID of the oldest cap flush. List is ideal for this > > usage. > > > > Signed-off-by: Yan, Zheng <zyan@xxxxxxxxxx> > > > > While walking ci->i_cap_flush_list, __kick_flushing_caps drops and > > reacquires the i_ceph_lock on each iteration. It looks like > > __kick_flushing_caps could (e.g.) race with a reply from the MDS that > > removes the entry from the list. Is there something that prevents that > > that I'm not seeing? > > > > I think it's protected by s_mutex. I checked the code only > ceph_kick_flushing_caps() is not protected by s_mutex. it should be > easy to fix. > What does the i_ceph_lock actually protect here then? Does this mean that this code doesn't actually need the i_ceph_lock? That sounds ok as a possible stopgap fix, but longer term I'd really like to see us reduce the coverage of the session and mdsc mutexes. Maybe we can rework this code to queue up the cap flushes in some fashion while holding the spinlock, and then send them off later once we've released it? Looking at ceph_kick_flushing_caps... AFAICT, the s_cap_flushing list should be protected by mdsc- >cap_dirty_lock, but it's not held while walking it in ceph_kick_flushing_caps or ceph_early_kick_flushing_caps. There is some effort to track certain tids in there, but it's not clear to me that that is enough to protect from concurrent list manipulation. Does it somehow? -- Jeff Layton <jlayton@xxxxxxxxxx>