Re: [PATCH] ata: sata_mv: Fix the return value of the probe function

Zheyu Ma <zheyuma97@xxxxxxxxx> · Fri, 22 Oct 2021 17:18:24 +0800

On Fri, Oct 22, 2021 at 9:41 AM Damien Le Moal
<damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote:
>
> On 2021/10/21 20:23, Zheyu Ma wrote:
> > On Thu, Oct 21, 2021 at 6:38 PM Damien Le Moal
> > <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> On 2021/10/21 17:37, Sergey Shtylyov wrote:
> >>> On 21.10.2021 8:57, Zheyu Ma wrote:
> >>>
> >>>> mv_init_host() propagates the value returned by mv_chip_id() which in turn
> >>>> gets propagated by mv_pci_init_one() and hits local_pci_probe().
> >>>>
> >>>> During the process of driver probing, the probe function should return < 0
> >>>> for failure, otherwise, the kernel will treat value > 0 as success.
> >>>>
> >>>> Signed-off-by: Zheyu Ma <zheyuma97@xxxxxxxxx>
> >>>> ---
> >>>>   drivers/ata/sata_mv.c | 2 +-
> >>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
> >>>> index 9d86203e1e7a..7461fe078dd1 100644
> >>>> --- a/drivers/ata/sata_mv.c
> >>>> +++ b/drivers/ata/sata_mv.c
> >>>> @@ -3897,7 +3897,7 @@ static int mv_chip_id(struct ata_host *host, unsigned int board_idx)
> >>>>
> >>>>      default:
> >>>>              dev_err(host->dev, "BUG: invalid board index %u\n", board_idx);
> >>>> -            return 1;
> >>>> +            return -ENODEV;
> >>>
> >>>     Doesn't -EINVAL fit better here?
> >>
> >> If the error message is correct and this can only happen if there is a bug
> >> somewhere, I do not think the error code really matters much. The dev_err()
> >> should probably be changed to dev_alert() or even dev_crit() for this case.
> >>
> >
> > I don't think so, the error code does matter. If mv_chip_id() returns
> > 1 which eventually causes the probe function to return 1, then the
> > kernel will assume that the driver and the hardware match successfully
> > (even if that is not the case), which will cause the following error
> > if modprobe is called to remove the driver.
> >
> > [   21.944486] general protection fault, probably for non-canonical
> > address 0xdffffc000000001b: 0000 [#1] PREEMPT SMP KASAN PTI
> > [   21.945317] KASAN: null-ptr-deref in range
> > [0x00000000000000d8-0x00000000000000df]
> > [   21.954442] Call Trace:
> > [   21.954624]  ? scsi_remove_host+0x32/0x660
> > [   21.954923]  ? lockdep_hardirqs_on+0x7e/0x110
> > [   21.955240]  ? _raw_spin_unlock_irqrestore+0x30/0x60
> > [   21.955634]  ? mutex_lock_io_nested+0x60/0x60
> > [   21.956027]  ? _raw_spin_unlock_irqrestore+0x41/0x60
> > [   21.956395]  ? async_synchronize_cookie_domain+0x35f/0x4a0
> > [   21.956802]  ? async_synchronize_full_domain+0x20/0x20
> > [   21.957179]  ? lock_release+0x63f/0x8f0
> > [   21.957468]  mutex_lock_nested+0x1b/0x30
> > [   21.957761]  scsi_remove_host+0x32/0x660
> > [   21.958054]  ata_host_detach+0x75d/0x830
> > [   21.958349]  ata_pci_remove_one+0x3b/0x40
> > [   21.958649]  pci_device_remove+0xa9/0x250
> > [   21.958949]  ? pci_device_probe+0x7d0/0x7d0
> > [   21.959261]  device_release_driver_internal+0x4f7/0x7a0
> > [   21.959647]  driver_detach+0x1e8/0x2c0
> > [   21.959929]  bus_remove_driver+0x134/0x290
> > [   21.960234]  ? sysfs_remove_groups+0x97/0xb0
> > [   21.960552]  driver_unregister+0x77/0xa0
> > [   21.960859]  pci_unregister_driver+0x2c/0x1c0
> > [   21.961178]  cleanup_module+0x15/0x28 [sata_mv]
>
> How do you trigger this ? A bad device tree or something like that ?

Pretty much, I was testing on qemu and used fault injection to force
the my_chip_id() to fail, even though this rarely happens.

Regards,
Zheyu Ma