On 2021/10/21 20:23, Zheyu Ma wrote: > On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal > <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: >> >> On 2021/10/21 17:37, Sergey Shtylyov wrote: >>> On 21.10.2021 8:57, Zheyu Ma wrote: >>> >>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn >>>> gets propagated by mv_pci_init_one() and hits local_pci_probe(). >>>> >>>> During the process of driver probing, the probe function should return < 0 >>>> for failure, otherwise, the kernel will treat value > 0 as success. >>>> >>>> Signed-off-by: Zheyu Ma <zheyuma97@xxxxxxxxx> >>>> --- >>>> drivers/ata/sata_mv.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c >>>> index 9d86203e1e7a..7461fe078dd1 100644 >>>> --- a/drivers/ata/sata_mv.c >>>> +++ b/drivers/ata/sata_mv.c >>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx) >>>> >>>> default: >>>> dev_err(host->dev, "BUG: invalid board index %u\n", board_idx); >>>> - return 1; >>>> + return -ENODEV; >>> >>> Doesn't -EINVAL fit better here? >> >> If the error message is correct and this can only happen if there is a bug >> somewhere, I do not think the error code really matters much. The dev_err() >> should probably be changed to dev_alert() or even dev_crit() for this case. >> > > I don't think so, the error code does matter. If mv_chip_id() returns > 1 which eventually causes the probe function to return 1, then the > kernel will assume that the driver and the hardware match successfully > (even if that is not the case), which will cause the following error > if modprobe is called to remove the driver. > > [ 21.944486] general protection fault, probably for non-canonical > address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI > [ 21.945317] KASAN: null-ptr-deref in range > [0x00000000000000d8-0x00000000000000df] > [ 21.954442] Call Trace: > [ 21.954624] ? scsi_remove_host+0x32/0x660 > [ 21.954923] ? lockdep_hardirqs_on+0x7e/0x110 > [ 21.955240] ? _raw_spin_unlock_irqrestore+0x30/0x60 > [ 21.955634] ? mutex_lock_io_nested+0x60/0x60 > [ 21.956027] ? _raw_spin_unlock_irqrestore+0x41/0x60 > [ 21.956395] ? async_synchronize_cookie_domain+0x35f/0x4a0 > [ 21.956802] ? async_synchronize_full_domain+0x20/0x20 > [ 21.957179] ? lock_release+0x63f/0x8f0 > [ 21.957468] mutex_lock_nested+0x1b/0x30 > [ 21.957761] scsi_remove_host+0x32/0x660 > [ 21.958054] ata_host_detach+0x75d/0x830 > [ 21.958349] ata_pci_remove_one+0x3b/0x40 > [ 21.958649] pci_device_remove+0xa9/0x250 > [ 21.958949] ? pci_device_probe+0x7d0/0x7d0 > [ 21.959261] device_release_driver_internal+0x4f7/0x7a0 > [ 21.959647] driver_detach+0x1e8/0x2c0 > [ 21.959929] bus_remove_driver+0x134/0x290 > [ 21.960234] ? sysfs_remove_groups+0x97/0xb0 > [ 21.960552] driver_unregister+0x77/0xa0 > [ 21.960859] pci_unregister_driver+0x2c/0x1c0 > [ 21.961178] cleanup_module+0x15/0x28 [sata_mv] How do you trigger this ? A bad device tree or something like that ? > > This is not the case if the correct error code is returned. > >>> >>> [...] >>> >>> MBR, Sergey >>> >> >> >> -- >> Damien Le Moal >> Western Digital Research > > Regards, > Zheyu Ma > -- Damien Le Moal Western Digital Research