Re: [PATCH v2 3/8] kbuild: move core-y and drivers-y to ./Kbuild

Masahiro Yamada <masahiroy@xxxxxxxxxx> · Fri, 23 Sep 2022 21:09:43 +0900

On Fri, Sep 23, 2022 at 8:39 PM Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
>
> Hi All,
>
> CCed: Christoph and Robin, although this is not strictly DMA-mapping
> related issue...
>
> On 22.09.2022 14:16, Marek Szyprowski wrote:
> > On 21.09.2022 06:39, Guenter Roeck wrote:
> >> On Tue, Sep 06, 2022 at 03:13:08PM +0900, Masahiro Yamada wrote:
> >>> Use the ordinary obj-y to list subdirectories.
> >>>
> >>> Note1:
> >>> GNU Make seems to transform './.modules.order' to '.modules.order'
> >>> before matching it against the target pattern. Split ./.modules.order
> >>> to a dedicated rule to avoid "doesn't match the target pattern"
> >>> warning. [1]
> >>>
> >>> Note2:
> >>> Previously, the link order of lib-y depended on CONFIG_MODULES; lib-y
> >>> was linked before drivers-y when CONFIG_MODULES=y, otherwise after
> >>> drivers-y. This was a bug of commit 7273ad2b08f8 ("kbuild: link lib-y
> >>> objects to vmlinux forcibly when CONFIG_MODULES=y"), but it was not a
> >>> big deal after all. Now, libs-y (all objects that come from lib/ and
> >>> arch/*/lib/) is linked last, irrespective of CONFIG_MODULES.
> >>>
> >>> Note3:
> >>> Now, the single target build in arch/*/lib/ works correctly. There was
> >>> a bug report about this. [2]
> >>>
> >>>    $ make ARCH=arm arch/arm/lib/findbit.o
> >>>      CALL    scripts/checksyscalls.sh
> >>>      AS      arch/arm/lib/findbit.o
> >>>
> >>> [1]: https://lists.gnu.org/archive/html/bug-make/2022-08/msg00059.html
> >>> [2]:
> >>> https://lore.kernel.org/linux-kbuild/YvUQOwL6lD4%2F5%2FU6@xxxxxxxxxxxxxxxxxxxxx/
> >>>
> >>> Signed-off-by: Masahiro Yamada <masahiroy@xxxxxxxxxx>
> >> With this patch in place, all parisc images crash during boot with the
> >> crash dump below. Bisect on next-20220920 points to this patch.
> >> Crash and bisect logs are attached.
> >>
> >> Loking through boot logs, the same problem (same backtrace) is seen
> >> with various boot tests on alpha. There may be more, but -next
> >> crashes all over the place right now so it is difficult to determine
> >> the platforms affected by a specicfic problem.
> >
> > I confirm this. I've observed similar issue recently (from time to
> > time) on the ARM 32bit based Exynos5250-based Arndale board. 'git
> > bisect' lead me also to this change. A short investigation revealed
> > that the issue is caused by the NULL sgp->pool in
> > lib/sg_pool.c:sg_pool_alloc() function. Sometimes it works fine,
> > sometimes not. It looks that there is a race in the sg_pools and scsi
> > initialization somehow after this patch.
> >
> > To confirm that this change is really responsible for the issue I've
> > tried to revert it on top of linux-next. This is however hard due to
> > the dependencies. Instead I've only did:
> >
> > # git revert -m1 6fb5a32d95a9b1a9d4a6e135404468a8d74cfde6
> >
> > on top of the linux next-20220921 and resolved the conflict (trivial
> > to resolve). After that the kernel boots fine and I was not able to
> > reproduce the issue.
> >
> > The issue I've observed was:
> >
> > ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > ata1.00: HPA detected: current 117229295, native 117231408
> > ata1.00: ATA-8: Corsair CSSD-F60GB2, 1.1, max UDMA/133
> > ata1.00: 117229295 sectors, multi 1: LBA48 NCQ (depth 32)
> > ata1.00: configured for UDMA/133
> > scsi 0:0:0:0: Direct-Access     ATA      Corsair CSSD-F60 1.1  PQ: 0
> > ANSI: 5
> > sd 0:0:0:0: [sda] 117229295 512-byte logical blocks: (60.0 GB/55.9 GiB)
> > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > sd 0:0:0:0: [sda] Write Protect is off
> > sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
> > 8<--- cut here ---
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000034
> > usb3503 usb-hub: switched to HUB mode
> > [00000034] *pgd=00000000
> > usb3503 usb-hub: usb3503_probe: probed in hub mode
> > UDC core: g_ether: couldn't find an available UDC
> >
> > Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> > Modules linked in:
> > CPU: 0 PID: 7 Comm: kworker/0:0H Not tainted
> > 6.0.0-rc5-00016-g10d1d4b75525 #12835
> > Hardware name: Samsung Exynos (Flattened Device Tree)
> > Workqueue: kblockd blk_mq_run_work_fn
> > PC is at mempool_alloc+0x48/0x14c
> > LR is at mempool_alloc+0x2c/0x14c
> > pc : [<c0267a2c>]    lr : [<c0267a10>]    psr: 40000013
> > ...
> > Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > Control: 10c5387d  Table: 4000406a  DAC: 00000051
> > ...
> > Process kworker/0:0H (pid: 7, stack limit = 0x(ptrval))
> > Stack: (0xf0839cc8 to 0xf083a000)
> > ...
> >  mempool_alloc from __sg_alloc_table+0x120/0x160
> >  __sg_alloc_table from sg_alloc_table_chained+0x5c/0xc0
> >  sg_alloc_table_chained from scsi_alloc_sgtables+0x5c/0x294
> >  scsi_alloc_sgtables from sd_init_command+0x118/0x908
> >  sd_init_command from scsi_queue_rq+0x360/0xbe0
> >  scsi_queue_rq from blk_mq_dispatch_rq_list+0x1d8/0x8c8
> >  blk_mq_dispatch_rq_list from blk_mq_do_dispatch_sched+0x2e0/0x338
> >  blk_mq_do_dispatch_sched from
> > __blk_mq_sched_dispatch_requests+0xa8/0x150
> >  __blk_mq_sched_dispatch_requests from
> > blk_mq_sched_dispatch_requests+0x34/0x5c
> >  blk_mq_sched_dispatch_requests from __blk_mq_run_hw_queue+0x88/0x22c
> >  __blk_mq_run_hw_queue from process_one_work+0x288/0x778
> >  process_one_work from worker_thread+0x44/0x504
> >  worker_thread from kthread+0xf0/0x124
> >  kthread from ret_from_fork+0x14/0x2c
> > Exception stack(0xf0839fb0 to 0xf0839ff8)
> > 9fa0:                                     00000000 00000000 00000000
> > 00000000
> > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000
> > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > Code: e30ea764 e3c44d11 e34ca017 e3844a92 (e5953034)
> > ---[ end trace 0000000000000000 ]---
> >
> I've analyzed this issue a bit more and it looks that the $subject patch
> just revealed a bug in the lib/sg_pool.c. This looks a bit suspicious
> for me:
>
> module_init(sg_pool_init);
>
> After changing the above to:
>
> core_initcall(sg_pool_init);
>
> the issue is gone.
>
>
> Please confirm that this is a proper way of fixing this issue, so I will
> send a final patch.

I am not sure about the choice for *_initcall.
Please feel free to override me.

Anyway, I will make my Kbuild refactoring not depend on sg_pool fix-up.

-- 
Best Regards
Masahiro Yamada