On 5/23/23 18:23, Jason Yan wrote: > On 2023/5/23 16:29, John Garry wrote: >> On 23/05/2023 09:04, Damien Le Moal wrote: >>> For devices not attached to a port multiplier and managed directly by >>> libata, the device number passed to ata_find_dev() must always be lower >>> than the maximum number of devices returned by ata_link_max_devices(). >>> That is 1 for SATA devices or 2 for an IDE link with master+slave >>> devices. This device number is the scsi device ID which matches these >>> constraint as the ID are generated per port and so never exceed the >>> link maximum. >>> >>> However, for libsas managed devices, scsi device IDs are assigned per >>> scsi host, leading to device IDs for SATA devices that can be well in >>> excess of libata per-link maximum number of devices. This results in >>> ata_find_dev() always returning NULL for libsas managed devices except >>> for the first device of the host with ID (device number) 0. This issue >>> is visible by executing hdparm command, which fails: >>> >>> hdparm -i /dev/sdX >>> /dev/sdX: >>> HDIO_GET_IDENTITY failed: No message of desired type >>> >>> Fix this by rewriting ata_find_dev() to ignore the device number for >>> non-pmp attached devices with a link with at most 1 device, that is SATA >>> devices on SATA ports. For these, device number 0 is always used to >>> return the correct ata_device struct of the port link. This change >>> excludes IDE master/slave setups (maximum number of devices per link >>> is 2) and port-multiplier attached devices. Also, to be consistant with >>> the fact that scsi device IDs and channel numbers used as device numbers >>> are both unsigned int, change the devno argument of ata_find_dev() to >>> unsinged int. >>> >>> Reported-by: Xingui Yang <yangxingui@xxxxxxxxxx> >>> Fixes: 41bda9c98035 ("libata-link: update hotplug to handle PMP links") >>> Cc: stable@xxxxxxxxxxxxxxx >>> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx> >> >> Reviewed-by: John Garry <john.g.garry@xxxxxxxxxx> > > Hi Damien & John, > > I think we may missed something. What about if we do this: > > echo "scsi add-single-device 0 0 100 0" >/proc/scsi/scsi > > Then in ata_scsi_user_scan() we will return device "0 0 0 0 " and rescan > this device, which is wrong? I did some more digging into this. And I do not think there are any issues (I tested and it works). The reason is that the "shost" passed to ata_scsi_user_scan() corresponds to the scsi host for the ata_port of the ata_device in libsas (dev->sata_dev.ap). It is not the scsi_host representing the HBA itself (which has multiple devices). So changing the devno to 0 has no effect and deleting+rescanning particular devices leads to the correct IDs being used. E.g., on my system, I have 4 drives connected to the pm80xx: # lsscsi -g [0:0:0:0] disk ATA WDC WUH721818AL W232 /dev/sdc /dev/sg3 [0:0:1:0] disk ATA WDC WUH721818AL WTW2 /dev/sdb /dev/sg1 [0:0:2:0] disk ATA WDC WUH722222AL Wf86 /dev/sdd /dev/sg2 [0:0:3:0] zbc ATA WDC WSH722020AL W803 /dev/sda /dev/sg0 If I remove the first 0 and 2: # echo 1 > /sys/class/scsi_device/0:0:0:0/device/delete # echo 1 > /sys/class/scsi_device/0:0:2:0/device/delete # lsscsi -g [0:0:1:0] disk ATA WDC WUH721818AL WTW2 /dev/sdb /dev/sg1 [0:0:3:0] zbc ATA WDC WSH722020AL W803 /dev/sda /dev/sg0 And then manually rescan in reverse order of the removal: # echo "scsi add-single-device 0 0 2 0" > /proc/scsi/scsi # lsscsi -g [0:0:1:0] disk ATA WDC WUH721818AL WTW2 /dev/sdb /dev/sg1 [0:0:2:0] disk ATA WDC WUH722222AL Wf86 /dev/sdc /dev/sg2 [0:0:3:0] zbc ATA WDC WSH722020AL W803 /dev/sda /dev/sg0 # echo "scsi add-single-device 0 0 0 0" > /proc/scsi/scsi # lsscsi -g [0:0:0:0] disk ATA WDC WUH721818AL W232 /dev/sdd /dev/sg3 [0:0:1:0] disk ATA WDC WUH721818AL WTW2 /dev/sdb /dev/sg1 [0:0:2:0] disk ATA WDC WUH722222AL Wf86 /dev/sdc /dev/sg2 [0:0:3:0] zbc ATA WDC WSH722020AL W803 /dev/sda /dev/sg0 I get back all devices with the correct IDs. Note that I tried John's suggestion as well using an ata_for_each_dev() loop. That does work well for libsas as we do have ata_dev->sdev set to the scsi device already when scanning, but does not work for AHCI as we do not. I will keep doing some more tests with v3 (and correct typos and suggested) but I think this is all good. -- Damien Le Moal Western Digital Research