Re: [RFC PATCH 3/4] nfs/blocklayout: Fix premature PR key unregistration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20 Jun 2024, at 11:56, Benjamin Coddington wrote:

> On 20 Jun 2024, at 11:46, Chuck Lever wrote:
>
>> On Thu, Jun 20, 2024 at 11:30:54AM -0400, Benjamin Coddington wrote:
>>> On 20 Jun 2024, at 10:34, Chuck Lever wrote:
>>>
>>>> On Thu, Jun 20, 2024 at 09:51:46AM -0400, Benjamin Coddington wrote:
>>>>> On 19 Jun 2024, at 13:39, cel@xxxxxxxxxx wrote:
>>>>>
>>>>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>>
>>>>>> During generic/069 runs with pNFS SCSI layouts, the NFS client emits
>>>>>> the following in the system journal:
>>>>>>
>>>>>> kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
>>>>>> kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
>>>>>> kernel: pNFS: failed to open device /dev/disk/by-id/dm-uuid-mpath-0x6001405e3366f045b7949eb8e4540b51 (-2)
>>>>>> kernel: pNFS: using block device sdb (reservation key 0x666b60901e7b26b3)
>>>>>> kernel: sd 6:0:0:1: reservation conflict
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#16 CDB: Write(10) 2a 00 00 00 00 50 00 00 08 00
>>>>>> kernel: reservation conflict error, dev sdb, sector 80 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
>>>>>> kernel: sd 6:0:0:1: reservation conflict
>>>>>> kernel: sd 6:0:0:1: reservation conflict
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#18 CDB: Write(10) 2a 00 00 00 00 60 00 00 08 00
>>>>>> kernel: sd 6:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 00 00 00 58 00 00 08 00
>>>>>> kernel: reservation conflict error, dev sdb, sector 96 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
>>>>>> kernel: reservation conflict error, dev sdb, sector 88 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
>>>>>> systemd[1]: fstests-generic-069.scope: Deactivated successfully.
>>>>>> systemd[1]: fstests-generic-069.scope: Consumed 5.092s CPU time.
>>>>>> systemd[1]: media-test.mount: Deactivated successfully.
>>>>>> systemd[1]: media-scratch.mount: Deactivated successfully.
>>>>>> kernel: sd 6:0:0:1: reservation conflict
>>>>>> kernel: failed to unregister PR key.
>>>>>>
>>>>>> This appears to be due to a race. bl_alloc_lseg() calls this:
>>>>>>
>>>>>> 561 static struct nfs4_deviceid_node *
>>>>>> 562 bl_find_get_deviceid(struct nfs_server *server,
>>>>>> 563                 const struct nfs4_deviceid *id, const struct cred *cred,
>>>>>> 564                 gfp_t gfp_mask)
>>>>>> 565 {
>>>>>> 566         struct nfs4_deviceid_node *node;
>>>>>> 567         unsigned long start, end;
>>>>>> 568
>>>>>> 569 retry:
>>>>>> 570         node = nfs4_find_get_deviceid(server, id, cred, gfp_mask);
>>>>>> 571         if (!node)
>>>>>> 572                 return ERR_PTR(-ENODEV);
>>>>>>
>>>>>> nfs4_find_get_deviceid() does a lookup without the spin lock first.
>>>>>> If it can't find a matching deviceid, it creates a new device_info
>>>>>> (which calls bl_alloc_deviceid_node, and that registers the device's
>>>>>> PR key).
>>>>>>
>>>>>> Then it takes the nfs4_deviceid_lock and looks up the deviceid again.
>>>>>> If it finds it this time, bl_find_get_deviceid() frees the spare
>>>>>> (new) device_info, which unregisters the PR key for the same device.
>>>>>>
>>>>>> Any subsequent I/O from this client on that device gets EBADE.
>>>>>>
>>>>>> The umount later unregisters the device's PR key again.
>>>>>>
>>>>>> To prevent this problem, register the PR key after the deviceid_node
>>>>>> lookup.
>>>>>
>>>>> Hi Chuck - nice catch, but I'm not seeing how we don't have the same problem
>>>>> after this patch, instead it just seems like it moves the race.  What
>>>>> prevents another process waiting to take the nfs4_deviceid_lock from
>>>>> unregistering the same device?  I think we need another way to signal
>>>>> bl_free_device that we don't want to unregister for the case where the new
>>>>> device isn't added to nfs4_deviceid_cache.
>>>>
>>>> That's a (related but) somewhat different issue. I haven't seen
>>>> that in practice so far.
>>>>
>>>>
>>>>> No good ideas yet - maybe we can use a flag set within the
>>>>> nfs4_deviceid_lock?
>>>>
>>>> Well this smells like a use for a reference count on the block
>>>> device, but fs/nfs doesn't control the definition of that data
>>>> structure.
>>>
>>> I think we need two things to fix this race:
>>>  - a way to determine if the current thread is the one
>>>    that added the device to the to the cache, if so do the register
>>>  - a way to determine if we're freeing because we lost the race to the
>>>    cache, if so don't un-register.
>>
>> My patch is supposed to deal with all of that already. Can you show
>> me specifically what is not getting handled by my proposed change?
>
> Well - I may be missing something, but it looks like with this patch you can
> still have:
>
> Thread
> A                           B
>
> nfs4_find_get_deviceid
> new{a} = nfs4_get_device_info
> locks nfs4_deviceid_lock
>                             nfs4_find_get_deviceid
>                             new{b} = nfs4_get_device_info
>                             spins on nfs4_deviceid_lock
> adds new{a} to the cache
> unlocks nfs4_deviceid_lock
> pr_register
>                             locks nfs4_deviceid_lock
>                             finds new{a}
>                             pr_UNregisters new{b}
>
> In this case, you end up with an unregistered device.

Oh jeez Chuck, nevermind - I am missing something, that we check the the
new{b} pnfs_block_dev->pr_registred before unregistering it.

So, actually - I think this patch does solve the problem.  I apologize for
the noise here.

Ben






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux