Re: [PATCH] pci-driver: Add driver load messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 26, 2021 at 08:42:12AM -0500, Prarit Bhargava wrote:
>
>
> On 1/26/21 8:14 AM, Leon Romanovsky wrote:
> > On Tue, Jan 26, 2021 at 07:54:46AM -0500, Prarit Bhargava wrote:
> >>   Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >>> On Mon, Jan 25, 2021 at 02:41:38PM -0500, Prarit Bhargava wrote:
> >>>> There are two situations where driver load messages are helpful.
> >>>>
> >>>> 1) Some drivers silently load on devices and debugging driver or system
> >>>> failures in these cases is difficult.  While some drivers (networking
> >>>> for example) may not completely initialize when the PCI driver probe() function
> >>>> has returned, it is still useful to have some idea of driver completion.
> >>>
> >>> Sorry, probably it is me, but I don't understand this use case.
> >>> Are you adding global to whole kernel command line boot argument to debug
> >>> what and when?
> >>>
> >>> During boot:
> >>> If device success, you will see it in /sys/bus/pci/[drivers|devices]/*.
> >>> If device fails, you should get an error from that device (fix the
> >>> device to return an error), or something immediately won't work and
> >>> you won't see it in sysfs.
> >>>
> >>
> >> What if there is a panic during boot?  There's no way to get to sysfs.
> >> That's the case where this is helpful.
> >
> > How? If you have kernel panic, it means you have much more worse problem
> > than not-supported device. If kernel panic was caused by the driver, you
> > will see call trace related to it. If kernel panic was caused by
> > something else, supported/not supported won't help here.
>
> I still have no idea *WHICH* device it was that the panic occurred on.

The kernel panic is printed from the driver. There is one driver loaded
for all same PCI devices which are probed without relation to their
number.

If you have host with ten same cards, you will see one driver and this
is where the problem and not in supported/not-supported device.

> >
> >>
> >>> During run:
> >>> We have many other solutions to get debug prints during run, for example
> >>> tracing, which is possible to toggle dynamically.
> >>>
> >>> Right now, my laptop will print 34 prints on boot and endless amount during
> >>> day-to-day usage.
> >>>
> >>> ➜  kernel git:(rdma-next) ✗ lspci |wc -l
> >>> 34
> >>>
> >>>>
> >>>> 2) Storage and Network device vendors have relatively short lives for
> >>>> some of their hardware.  Some devices may continue to function but are
> >>>> problematic due to out-of-date firmware or other issues.  Maintaining
> >>>> a database of the hardware is out-of-the-question in the kernel as it would
> >>>> require constant updating.  Outputting a message in the log would allow
> >>>> different OSes to determine if the problem hardware was truly supported or not.
> >>>
> >>> And rely on some dmesg output as a true source of supported/not supported and
> >>> making this ABI which needs knob in command line. ?
> >>
> >> Yes.  The console log being saved would work as a true source of load
> >> messages to be interpreted by an OS tool.  But I see your point about the
> >> knob below...
> >
> > You will need much more stronger claim than the above if you want to proceed
> > ABI path through dmesg prints.
> >
>
> See my answer below.  I agree with you on the ABI statement.
>
> >>
> >>>
> >>>>
> >>>> Add optional driver load messages from the PCI core that indicates which
> >>>> driver was loaded, on which slot, and on which device.
> >>>
> >>> Why don't you add simple pr_debug(..) without any knob? You will be able
> >>> to enable/disable it through dynamic prints facility.
> >>
> >> Good point.  I'll wait for more feedback and submit a v2 with pr_debug.
> >
> > Just to be clear, none of this can be ABI and any kernel print can
> > be changed or removed any minute without any announcement.
>
> Yes, that's absolutely the case and I agree with you that nothing can guarantee
> ABI of those pr_debug() statements.  They are *debug* after all.

You missed the point. ALL pr*() prints are not ABI, without relation to their level.

Thanks

>
> P.
>
> >
> > Thanks
> >
> >>
> >> P.
> >>
> >>>
> >>> Thanks
> >>
> >
>



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux