Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Saravana,

On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
> >   1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> >
> >       - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> >         node OF_POPULATED after init") is no longer needed (but already
> >         queued for v5.12 anyway)
>
> Rob doesn't like the proliferation of OF_POPULATED and we don't need
> it anymore, so maybe work it out with him? It's a balance between some
> wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

Rob: should it be reverted?  For v5.13?
I guess other similar "fixes" went in in the mean time.

> >       - Some devices are reprobed, despite their drivers returning
> >         a real error code, and not -EPROBE_DEFER:
>
> Sorry, it's not obvious from the logs below where "reprobing" is
> happening. Can you give more pointers please?

My log was indeed not a full log, but just the reprobes happening.
I'll send you a full log by private email.

> Also, thinking more about this, the only way I could see this happen is:
> 1. Device fails with error that's not -EPROBE_DEFER
> 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
> flag) where it's a consumer.
> 3. The supplier probes and the device gets added to the deferred probe
> list again.
>
> But I can't see how this sequence can happen. Device links are created
> only when a device is added. And is the supplier isn't added yet, the
> consumer wouldn't have probed in the first place.

The full log doesn't show any evidence of the device being added
to a list in between the two probes.

> Other than "annoying waste of time" is this causing any other problems?

Probably not.  But see below.

> >       - The PCI reprobing leads to a memory leak, for which I've sent a fix
> >         "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> >         https://lore.kernel.org/linux-pci/20210202100332.829047-1-geert+renesas@xxxxxxxxx/
>
> Wrt PCI reprobing,
> 1. Is this PCI never expected to probe, but it's being reattempted
> despite the NOT EPROBE_DEFER error? Or

There is no PCIe card present, so the failure is expected.
Later it is reprobed, which of course fails again.

> 2. The PCI was deferred probe when it should have probed and then when
> it's finally reattemped and it could succeed, we are hitting this mem
> leak issue?

I think the leak has always been there, but it was just exposed by
this unneeded reprobe.  I don't think a reprobe after that specific
error path had ever happened before.

> I'm basically trying to distinguish between "this stuff should never
> be retried" vs "this/it's suppliers got probe deferred with
> fw_devlink=on vs but didn't get probe deferred with
> fw_devlink=permissive and that's causing issues"

There should not be a probe deferral, as no -EPROBE_DEFER was
returned.

> >       - I2C on R-Car Gen3 does not seem to use DMA, according to
> >         /sys/kernel/debug/dmaengine/summary:
> >
> >             -dma4chan0    | e66d8000.i2c:tx
> >             -dma4chan1    | e66d8000.i2c:rx
> >             -dma5chan0    | e6510000.i2c:tx
>
> I think I need more context on the problem before I can try to fix it.
> I'm also very unfamiliar with that file. With fw_devlink=permissive,
> I2C was using DMA? If so, the next step is to see if the I2C relative
> probe order with DMA is getting changed and if so, why.

Yes, I plan to dig deeper to see what really happens...

> >       - On R-Mobile A1, I get a BUG and a memory leak:
> >
> >             BUG: spinlock bad magic on CPU#0, swapper/1

>
> Hmm... I looked at this in bits and pieces throughout the day. At
> least spent an hour looking at this. This doesn't make a lot of sense
> to me. I don't even touch anything in this code path AFAICT.  Are
> modules/kernel mixed up somehow? I need more info before I can help.
> Does reverting my pm domain change make any difference (assume it
> boots this far without it).

I plan to dig deeper to see what really happens...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux