On Thu, 2017-07-20 at 16:14 -0400, Olga Kornievskaia wrote: > > On Jul 20, 2017, at 3:56 PM, Trond Myklebust <trondmy@primarydata.c > > om> wrote: > > > > Hi Olga, > > > > Apologies for missing this patch. It was hiding in my 'linux- > > fsdevel' > > mailbox, so I didn't recognise it as a NFS patch. > > > > Yeah after I mailed it out I realized I cc-ed fsdevel incorrectly. > > > On Fri, 2017-06-30 at 15:52 -0400, Olga Kornievskaia wrote: > > > There is a regression by commit 8d40b0f14846 ("NFS > > > filelayout:call > > > GETDEVICEINFO after pnfs_layout_process completes"). It leaves > > > the > > > DS mount dangling. > > > > > > Previously, filelayout_alloc_sec() would call > > > filelayout_check_layout() > > > which would call nfs4_find_get_deviceid which ups the count on > > > the > > > device_id. It's only called once and it's matched by the > > > filelayout_free_lseg() that calls nfs4_fl_put_deviceid(). > > > > > > After that patch, each read/write ends up calling > > > nfs4_find_get_deviceid > > > and there is no balance for that. Instead, do > > > nfs4_fl_put_deviceid() > > > in the filelayout's .pg_cleanup and remove it from > > > filelayout_free_lseg. > > > > > > But we still need a reference to hold over the lifetime of the > > > segment. > > > For every new lseg that's created we need to take a reference on > > > deviceid > > > that uses it. It will be released in the "free_lseg" routine. > > > > This is what I'm not understanding. If you have a reference in the > > layout segment, then why do you need to call > > nfs4_find_get_deviceid() > > in the read/write code? > > I think I’m probably misunderstanding the question. It sounds to me > that you asking me for why the commit 8d40b0f14846 was done the way > it was done (I’d would say it was done as per your suggestion). > > I would say the call to nfs4_find_get_deviceid() has always been in > the read/write code. It was a part of the pnfs_update_layout() but > the commit 8d40b0f14846 moved it out of it (so that the layoutget was > complete) and then the call to the getdeviceinfo would be done. > > > Isn't it sufficient to change the "pg_init" calls to check whether > > or > > not the struct nfs4_filelayout_segment has set a value for dsaddr > > (that > > needs to be done with care to avoid races - cmpxchg() is your > > friend), > > and then rely on that reference being set for the remainder of the > > layout segment lifetime? > > Are you suggesting to change when getdeviceinfo is suppose to be > called? > No. Now that I'm looking at filelayout_check_deviceid(), here is what I suggest: we need to ensure that filelayout_check_deviceid() sets the deviceid once, and only once! How about something like the following (untested) patch? 8<------------------------------------------------------- >From 1e40ee13950d03ad2a54a0d1dba35f2a04c28ca0 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Thu, 20 Jul 2017 17:00:02 -0400 Subject: [PATCH] NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid() We must set fl->dsaddr once, and once only, even if there are multiple processes calling filelayout_check_deviceid() for the same layout segment. Reported-by: Olga Kornievskaia <kolga@xxxxxxxxxx> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> --- fs/nfs/filelayout/filelayout.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index 080fc6b278bd..44c638b7876c 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -542,6 +542,10 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo, struct nfs4_file_layout_dsaddr *dsaddr; int status = -EINVAL; + /* Is the deviceid already set? If so, we're good. */ + if (fl->dsaddr != NULL) + return 0; + /* find and reference the deviceid */ d = nfs4_find_get_deviceid(NFS_SERVER(lo->plh_inode), &fl->deviceid, lo->plh_lc_cred, gfp_flags); @@ -553,8 +557,6 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo, if (filelayout_test_devid_unavailable(&dsaddr->id_node)) goto out_put; - fl->dsaddr = dsaddr; - if (fl->first_stripe_index >= dsaddr->stripe_count) { dprintk("%s Bad first_stripe_index %u\n", __func__, fl->first_stripe_index); @@ -570,6 +572,13 @@ filelayout_check_deviceid(struct pnfs_layout_hdr *lo, goto out_put; } status = 0; + + /* + * Atomic compare and xchange to ensure we don't scribble + * over a non-NULL pointer. + */ + if (cmpxchg(&fl->dsaddr, NULL, dsaddr) != NULL) + goto out_put; out: return status; out_put: -- 2.13.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥