Re: [PATCH v3 1/1] PNFS fix dangling DS mount

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 20 Jul 2017 19:56:24 +0000

Hi Olga,

Apologies for missing this patch. It was hiding in my 'linux-fsdevel'
mailbox, so I didn't recognise it as a NFS patch.

On Fri, 2017-06-30 at 15:52 -0400, Olga Kornievskaia wrote:
> There is a regression by commit 8d40b0f14846 ("NFS filelayout:call
> GETDEVICEINFO after pnfs_layout_process completes"). It leaves the
> DS mount dangling.
> 
> Previously, filelayout_alloc_sec() would call
> filelayout_check_layout()
> which would call nfs4_find_get_deviceid which ups the count on the
> device_id. It's only called once and it's matched by the
> filelayout_free_lseg() that calls nfs4_fl_put_deviceid().
> 
> After that patch, each read/write ends up calling
> nfs4_find_get_deviceid
> and there is no balance for that. Instead, do nfs4_fl_put_deviceid()
> in the filelayout's .pg_cleanup and remove it from
> filelayout_free_lseg.
> 
> But we still need a reference to hold over the lifetime of the
> segment.
> For every new lseg that's created we need to take a reference on
> deviceid
> that uses it. It will be released in the "free_lseg" routine.

This is what I'm not understanding. If you have a reference in the
layout segment, then why do you need to call nfs4_find_get_deviceid()
in the read/write code?

Isn't it sufficient to change the "pg_init" calls to check whether or
not the struct nfs4_filelayout_segment has set a value for dsaddr (that
needs to be done with care to avoid races - cmpxchg() is your friend),
and then rely on that reference being set for the remainder of the
layout segment lifetime?

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx