On 2/12/23 12:39, Conor Dooley wrote:
On Sun, Feb 12, 2023 at 12:27:10PM -0800, Guenter Roeck wrote:
On 2/12/23 10:45, Conor Dooley wrote:
...
However, I still see that the patch series
results in boot hangs with the sifive_u qemu emulation, where
the log ends with "Oops - illegal instruction". Is that problem
being addressed as well ?
Hmm, if it died on the last commit in this series, then I am not sure.
If you meant with riscv/for-next or linux-next that's fixed by a patch
from Samuel:
https://patchwork.kernel.org/project/linux-riscv/patch/20230212021534.59121-3-samuel@xxxxxxxxxxxx/
It failed after the merge, so it looks like it may have been merge damage.
Anyway, I applied
RISC-V: Don't check text_mutex during stop_machine
That being:
https://lore.kernel.org/all/20220322022331.32136-1-palmer@xxxxxxxxxxxx/
Which handles the lockdep assertion during stop_machine...
riscv: Fix early alternative patching
riscv: Fix Zbb alternative IDs
and the sifive_u emulation no longer crashes. However, I still get
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:71 patch_insn_write+0x222/0x2f6
...but doesn't prevent the early "spam" of assertion failures from the
code patching for alternatives. I sent a patch to take the lock during
the alternative patching which should get rid of them for you. It did
for me at least!
https://lore.kernel.org/all/20230212194735.491785-1-conor@xxxxxxxxxx
repeated several times.
I then also tested
riscv: patch: Fixup lockdep warning in stop_machine
This one just deletes the lockdep check, so I would expect it to remove
the complaints.
riscv: Fix early alternative patching
riscv: Fix Zbb alternative IDs
which works fine (no warning backtrace) for sifive_u, but gives me
WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642
Hmm, do you have the full splat for this one handy?
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.2.0-rc7-next-20230210 #1
[ 0.000000] Hardware name: riscv-virtio,qemu (DT)
[ 0.000000] epc : trace_event_raw_init+0xde/0x642
[ 0.000000] ra : trace_event_raw_init+0x45a/0x642
[ 0.000000] epc : ffffffff8010571a ra : ffffffff80105a96 sp : ffffffff81803e60
[ 0.000000] gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : 0000000000000000
[ 0.000000] t1 : 5245432d3e000000 t2 : 0000000000000000 s0 : ffffffff81803f20
[ 0.000000] s1 : 000000000000045f a0 : 0000000000000000 a1 : ffffffff81331ef0
[ 0.000000] a2 : 000000000000025c a3 : 0000000000000001 a4 : ffffffff801056fa
[ 0.000000] a5 : 000000000000002c a6 : ffffffff8192e4d8 a7 : ffffffff81157a90
[ 0.000000] s2 : 0000000000000000 s3 : ffffffff81922870 s4 : ffffffff8192e4d8
[ 0.000000] s5 : ffffffff81011c30 s6 : 000000000000000a s7 : 0000000000000021
[ 0.000000] s8 : 000000000000005c s9 : ffffffff81331ee8 s10: 0000000000000001
[ 0.000000] s11: 0000000000000000 t3 : 0000000000000007 t4 : 0000000000000070
[ 0.000000] t5 : 0000000000000025 t6 : 0000000000000009
[ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[ 0.000000] [<ffffffff8010571a>] trace_event_raw_init+0xde/0x642
[ 0.000000] [<ffffffff80104d32>] event_init+0x28/0x84
[ 0.000000] [<ffffffff80c0f7ca>] trace_event_init+0x9e/0x2ae
[ 0.000000] [<ffffffff80c0f3a0>] trace_init+0x10/0x18
[ 0.000000] [<ffffffff80c00bc6>] start_kernel+0x50e/0x8f8
[ 0.000000] irq event stamp: 0
[ 0.000000] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.000000] softirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] event btrfs_clear_extent_bit has unsafe dereference of argument 1
[ 0.000000] print_fmt: "%pU: io_tree=%s ino=%llu root=%llu start=%llu len=%llu clear_bits=%s", REC->fsid, __print_symbolic(REC->owner, {IO_TREE_FS_PINNED_EXTENTS, "PINNED_EXTENTS"}, {IO_TREE_FS_EXCLUDED_EXTENTS, "EXCLUDED_EXTENTS"}, {IO_TREE_BTREE_INODE_IO, "BTREE_INODE_IO"}, {IO_TREE_INODE_IO, "INODE_IO"}, {IO_TREE_RELOC_BLOCKS, "RELOC_BLOCKS"}, {IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES"}, {IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES"}, {IO_TREE_INODE_FILE_EXTENT, "INODE_FILE_EXTENT"}, {IO_TREE_LOG_CSUM_RANGE, "LOG_CSUM_RANGE"}, {IO_TREE_SELFTEST, "SELFTEST"}), REC->ino, REC->rootid, REC->start, REC->len, __print_flags(REC->clear_bits, "|", { EXTENT_DIRTY, "DIRTY"}, { EXTENT_UPTODATE, "UPTODATE"}, { EXTENT_LOCKED, "LOCKED"}, { EXTENT_NEW, "NEW"}, { EXTENT_DELALLOC, "DELALLOC"}, { EXTENT_DEFRAG, "DEFRAG"}, { EXTENT_BOUNDARY, "BOUNDARY"}, { EXTENT_NODATASUM, "NODATASUM"}, { EXTENT_CLEAR_META_RESV, "CLEAR_META_RESV"}, { EXTENT_NEED_WAIT, "NEED_WAIT"}, { EXTENT_NORESERVE,
"NORESERVE"}, { EXTENT_QGROUP_RESERV
[ 0.000000] event btrfs_ordered_sched has unsafe dereference of argument 1
[ 0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[ 0.000000] event btrfs_work_sched has unsafe dereference of argument 1
[ 0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[ 0.000000] event btrfs_work_queued has unsafe dereference of argument 1
[ 0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[ 0.000000] event find_free_extent_search_loop has unsafe dereference of argument 1
and so on.
It bisects to "RISC-V: add zbb support to string functions", which also seems
to cause various boot failures. Unfortunately that patch is difficult to revert,
but marking TOOLCHAIN_HAS_ZBB as broken "fixes" it. I don't know if there is
a problem with the patch or with qemu. I'll disable RISCV_ISA_ZBB in my tests
for the time being to work around it.
Guenter
and a whole lot of
event btrfs_clear_extent_bit has unsafe dereference of argument 1
and similar messages when running the "virt" emulation. That was there before,
but drowned in the noise. Ok, guess I'll need another round of bisect.
Thanks for all of your testing :)