On Fri, Oct 22, 2021 at 9:41 AM Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > > On 2021/10/21 20:23, Zheyu Ma wrote: > > On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal > > <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > >> > >> On 2021/10/21 17:37, Sergey Shtylyov wrote: > >>> On 21.10.2021 8:57, Zheyu Ma wrote: > >>> > >>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn > >>>> gets propagated by mv_pci_init_one() and hits local_pci_probe(). > >>>> > >>>> During the process of driver probing, the probe function should return < 0 > >>>> for failure, otherwise, the kernel will treat value > 0 as success. > >>>> > >>>> Signed-off-by: Zheyu Ma <zheyuma97@xxxxxxxxx> > >>>> --- > >>>> drivers/ata/sata_mv.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c > >>>> index 9d86203e1e7a..7461fe078dd1 100644 > >>>> --- a/drivers/ata/sata_mv.c > >>>> +++ b/drivers/ata/sata_mv.c > >>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx) > >>>> > >>>> default: > >>>> dev_err(host->dev, "BUG: invalid board index %u\n", board_idx); > >>>> - return 1; > >>>> + return -ENODEV; > >>> > >>> Doesn't -EINVAL fit better here? > >> > >> If the error message is correct and this can only happen if there is a bug > >> somewhere, I do not think the error code really matters much. The dev_err() > >> should probably be changed to dev_alert() or even dev_crit() for this case. > >> > > > > I don't think so, the error code does matter. If mv_chip_id() returns > > 1 which eventually causes the probe function to return 1, then the > > kernel will assume that the driver and the hardware match successfully > > (even if that is not the case), which will cause the following error > > if modprobe is called to remove the driver. > > > > [ 21.944486] general protection fault, probably for non-canonical > > address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI > > [ 21.945317] KASAN: null-ptr-deref in range > > [0x00000000000000d8-0x00000000000000df] > > [ 21.954442] Call Trace: > > [ 21.954624] ? scsi_remove_host+0x32/0x660 > > [ 21.954923] ? lockdep_hardirqs_on+0x7e/0x110 > > [ 21.955240] ? _raw_spin_unlock_irqrestore+0x30/0x60 > > [ 21.955634] ? mutex_lock_io_nested+0x60/0x60 > > [ 21.956027] ? _raw_spin_unlock_irqrestore+0x41/0x60 > > [ 21.956395] ? async_synchronize_cookie_domain+0x35f/0x4a0 > > [ 21.956802] ? async_synchronize_full_domain+0x20/0x20 > > [ 21.957179] ? lock_release+0x63f/0x8f0 > > [ 21.957468] mutex_lock_nested+0x1b/0x30 > > [ 21.957761] scsi_remove_host+0x32/0x660 > > [ 21.958054] ata_host_detach+0x75d/0x830 > > [ 21.958349] ata_pci_remove_one+0x3b/0x40 > > [ 21.958649] pci_device_remove+0xa9/0x250 > > [ 21.958949] ? pci_device_probe+0x7d0/0x7d0 > > [ 21.959261] device_release_driver_internal+0x4f7/0x7a0 > > [ 21.959647] driver_detach+0x1e8/0x2c0 > > [ 21.959929] bus_remove_driver+0x134/0x290 > > [ 21.960234] ? sysfs_remove_groups+0x97/0xb0 > > [ 21.960552] driver_unregister+0x77/0xa0 > > [ 21.960859] pci_unregister_driver+0x2c/0x1c0 > > [ 21.961178] cleanup_module+0x15/0x28 [sata_mv] > > How do you trigger this ? A bad device tree or something like that ? Pretty much, I was testing on qemu and used fault injection to force the my_chip_id() to fail, even though this rarely happens. Regards, Zheyu Ma