On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote: > In production we have been hitting the following warning consistently > > ------------[ cut here ]------------ > refcount_t: underflow; use-after-free. > WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 > refcount_warn_saturate+0x9c/0xe0 > Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] > RIP: 0010:refcount_warn_saturate+0x9c/0xe0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __warn+0x9f/0x130 > ? refcount_warn_saturate+0x9c/0xe0 > ? report_bug+0xcc/0x150 > ? handle_bug+0x3d/0x70 > ? exc_invalid_op+0x16/0x40 > ? asm_exc_invalid_op+0x16/0x20 > ? refcount_warn_saturate+0x9c/0xe0 > nfs_direct_write_schedule_work+0x237/0x250 [nfs] > process_one_work+0x12f/0x4a0 > worker_thread+0x14e/0x3b0 > ? ZSTD_getCParams_internal+0x220/0x220 > kthread+0xdc/0x120 > ? __btf_name_valid+0xa0/0xa0 > ret_from_fork+0x1f/0x30 > > This is because we're completing the nfs_direct_request twice in a > row. > > The source of this is when we have our commit requests to submit, we > process them and send them off, and then in the completion path for > the > commit requests we have > > if (nfs_commit_end(cinfo.mds)) > nfs_direct_write_complete(dreq); > > However since we're submitting asynchronous requests we sometimes > have > one that completes before we submit the next one, so we end up > calling > complete on the nfs_direct_request twice. > > The only other place we use nfs_generic_commit_list() is in > __nfs_commit_inode, which wraps this call in a > > nfs_commit_begin(); > nfs_commit_end(); > > Which is a common pattern for this style of completion handling, one > that is also repeated in the direct code with get_dreq()/put_dreq() > calls around where we process events as well as in the completion > paths. > > Fix this by using the same pattern for the commit requests. > > Before with my 200 node rocksdb stress running this warning would pop > every 10ish minutes. With my patch the stress test has been running > for > several hours without popping. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> I think this deserves to be a stable patch. Thanks for finding it! > --- > fs/nfs/direct.c | 11 +++++++++-- > fs/nfs/write.c | 2 +- > include/linux/nfs_fs.h | 1 + > 3 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index befcc167e25f..6b8798d01e3a 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct > nfs_direct_req *dreq) > LIST_HEAD(mds_list); > > nfs_init_cinfo_from_dreq(&cinfo, dreq); > + nfs_commit_begin(cinfo.mds); > nfs_scan_commit(dreq->inode, &mds_list, &cinfo); > res = nfs_generic_commit_list(dreq->inode, &mds_list, 0, > &cinfo); > - if (res < 0) /* res == -ENOMEM */ > - nfs_direct_write_reschedule(dreq); > + if (res < 0) { /* res == -ENOMEM */ > + spin_lock(&dreq->lock); > + if (dreq->flags == 0) > + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; > + spin_unlock(&dreq->lock); > + } > + if (nfs_commit_end(cinfo.mds)) > + nfs_direct_write_complete(dreq); > } > > static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq) > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index bb79d3a886ae..5d9dc6c05325 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct > nfs_mds_commit_info *cinfo) > !atomic_read(&cinfo- > >rpcs_out)); > } > > -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > { > atomic_inc(&cinfo->rpcs_out); > } > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index f5ce7b101146..d59116ac8209 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode, > struct folio *folio); > extern int nfs_commit_inode(struct inode *, int); > extern struct nfs_commit_data *nfs_commitdata_alloc(void); > extern void nfs_commit_free(struct nfs_commit_data *data); > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo); > bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); > > static inline bool nfs_have_writebacks(const struct inode *inode) -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx