Re: [PATCH v2 0/3] ata,sd: Fix reading concurrent positioning ranges

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Fri, 3 Jun 2022 10:30:04 +0900

On 6/3/22 07:51, Tyler Erickson wrote:
> This patch series fixes reading the concurrent positioning ranges.
> 
> The first patch fixes reading this in libata, where it was reading
> more data than a drive necessarily supports, resulting in a 
> command abort. 
> 
> The second patch fixes the SCSI translated data to put the VPD page
> length in the correct starting byte.
> 
> The third patch in sd, the fix is adding 4 instead of 3 for the header
> length. Using 3 will always result in an error and was likely used
> incorrectly as T10 specifications list all tables starting at byte 0,
> and byte 3 is the page length, which would mean there are 4 total
> bytes before the remaining data that contains the ranges and other
> information.
> 
> Tyler Erickson (3):
>   libata: fix reading concurrent positioning ranges log
>   libata: fix translation of concurrent positioning ranges
>   scsi: sd: Fix interpretation of VPD B9h length
> 
>  drivers/ata/libata-core.c | 21 +++++++++++++--------
>  drivers/ata/libata-scsi.c |  2 +-
>  drivers/scsi/sd.c         |  2 +-
>  3 files changed, 15 insertions(+), 10 deletions(-)
> 
> 
> base-commit: 700170bf6b4d773e328fa54ebb70ba444007c702

Looks all good to me. I tested this and really wonder how I did not catch
these mistakes earlier :)

Using a tcmu emulator for various concurrent positioning range configs to
test, I got a lockdep splat when unplugging the drive:

[  760.702859] ======================================================
[  760.702863] WARNING: possible circular locking dependency detected
[  760.702868] 5.18.0+ #1509 Not tainted
[  760.702875] ------------------------------------------------------
[...]
[  760.702966] the existing dependency chain (in reverse order) is:
[  760.702969]
[  760.702969] -> #1 (&q->sysfs_lock){+.+.}-{3:3}:
[  760.702982]        __mutex_lock+0x15b/0x1480
[  760.702998]        blk_ia_range_sysfs_show+0x41/0xc0
[  760.703010]        sysfs_kf_seq_show+0x1f2/0x360
[  760.703022]        seq_read_iter+0x40f/0x1130
[  760.703036]        new_sync_read+0x2d8/0x520
[  760.703049]        vfs_read+0x31a/0x450
[  760.703060]        ksys_read+0xf7/0x1d0
[  760.703070]        do_syscall_64+0x34/0x80
[  760.703081]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  760.703093]
[  760.703093] -> #0 (kn->active#385){++++}-{0:0}:
[  760.703108]        __lock_acquire+0x2ba6/0x6a20
[  760.703125]        lock_acquire+0x19f/0x510
[  760.703136]        __kernfs_remove+0x739/0x940
[  760.703145]        kernfs_remove_by_name_ns+0x90/0xe0
[  760.703154]        remove_files+0x8c/0x1a0
[  760.703165]        sysfs_remove_group+0x77/0x150
[  760.703175]        sysfs_remove_groups+0x4f/0x90
[  760.703186]        __kobject_del+0x7d/0x1b0
[  760.703196]        kobject_del+0x31/0x50
[  760.703203]        disk_unregister_independent_access_ranges+0x153/0x290
[  760.703214]        blk_unregister_queue+0x166/0x210
[  760.703226]        del_gendisk+0x2f8/0x7c0
[  760.703233]        sd_remove+0x5e/0xb0 [sd_mod]
[  760.703252]        device_release_driver_internal+0x3ad/0x750
[  760.703262]        bus_remove_device+0x2a6/0x570
[  760.703269]        device_del+0x48f/0xb50
[  760.703280]        __scsi_remove_device+0x21b/0x2b0 [scsi_mod]
[  760.703339]        scsi_remove_device+0x3a/0x50 [scsi_mod]
[  760.703391]        tcm_loop_port_unlink+0xca/0x160 [tcm_loop]
[  760.703407]        target_fabric_port_unlink+0xd5/0x120 [target_core_mod]
[  760.703494]        configfs_unlink+0x37f/0x7a0
[  760.703502]        vfs_unlink+0x295/0x800
[  760.703514]        do_unlinkat+0x2d9/0x560
[  760.703520]        __x64_sys_unlink+0xa5/0xf0
[  760.703528]        do_syscall_64+0x34/0x80
[  760.703537]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  760.703548]
[  760.703548] other info that might help us debug this:
[  760.703548]
[  760.703551]  Possible unsafe locking scenario:
[  760.703551]
[  760.703554]        CPU0                    CPU1
[  760.703556]        ----                    ----
[  760.703558]   lock(&q->sysfs_lock);
[  760.703565]                                lock(kn->active#385);
[  760.703573]                                lock(&q->sysfs_lock);
[  760.703579]   lock(kn->active#385);
[  760.703587]
[  760.703587]  *** DEADLOCK ***

This needs to be checked too, but that is not related to your fixes.

I will queue the libata patches for rc1 update.

Martin,

Do you want to take patch 3 or should I just take it ?

-- 
Damien Le Moal
Western Digital Research