Re: 64-bit userspace root file system for hppa64

Guenter Roeck <linux@xxxxxxxxxxxx> · Wed, 6 Dec 2023 19:20:54 -0800

On 12/6/23 13:43, Helge Deller wrote:
On 12/6/23 21:19, Guenter Roeck wrote:
On 12/6/23 09:00, Helge Deller wrote:
[ ... ]
Is it worth testing with multiple CPUs ? I can re-enable it and
check more closely if you think it makes sense. If so, what number
of CPUs would you recommend ?

I think 4 CPUs is realistic.
But I agree, that you probably see more issues.

Generally the assumption was, that the different caches on parisc
may trigger SMP issues, but given that those issues can be seen on
qemu, it indicates that there are generic SMP issues too.

Ok, I ran some tests overnight with 2-8 CPUs. Turns out the system is quite
stable,

cool!

with the exception of SCSI controllers. Some fail completely, others
rarely. Here is a quick summary:

- am53c974 fails with "Spurious irq, sreg=00", followed by "Aborting command"
   and a hung task crash.
- megasas and megasas-gen2 fail with
   "scsi host1: scsi scan: INQUIRY result too short (5), using 36"
   followed by
   "megaraid_sas 0000:00:04.0: Unknown command completed!"
   and a hung task crash
- mptsas1068 fails completely (no kernel log message seen)
- dc390 and lsi* report random "Spurious irq, sreg=00" messages and timeouts

I think none of those drivers have ever been tested
on physical hardware either.
So I'm astonished that it even worked that far :-)

Based on kernel sources, the "Spurious irq, sreg=%02x." error can only happen for the
am53c974 driver. Are you sure you see this message for dc390 and lsi* too?

Definitely for dc390:

qemu-system-hppa -M C3700 -kernel \
     vmlinux -no-reboot -snapshot -smp 4 -device rtl8139,netdev=net0 \
     -netdev user,id=net0 -device dc390,id=scsi -device \
     scsi-hd,bus=scsi.0,drive=d0 -drive \
     file=/var/cache/buildbot/parisc64/rootfs.ext2,format=raw,if=none,id=d0 \
     -append "root=/dev/sda rootwait console=ttyS0,115200 " \
     -nographic -monitor null

I'll have to re-check lsi*. My notes for lsi53c810 actually say:

    # Random crashes in sym_evaluate_dp(), called from sym_compute_residual()
    # (NULL pointer access). The problem is seen during shutdown. This is a
    # kernel bug, obviously, likely caused by timing differences. It is
    # possible if not likely that an interrupt is seen after the controller
    # was presumably disabled.

but that was for 32-bit. It turns out I don't have any notes for lsi53c895a.
I'll re-check both tonight.

For megaraid_sas I see a Seabios-hppa firmware patch is required.
Could you please give me the full command line how you start qemu?
Esp. since the lsi scsi is still there, how do you assign a disc to the additional
megaraid_sas driver?

qemu-system-hppa -M C3700 -kernel \
     vmlinux -no-reboot -snapshot -device pcnet,netdev=net0 -netdev \
     user,id=net0 -device megasas,id=scsi -device \
     scsi-hd,bus=scsi.0,drive=d0 -drive \
     file=/var/cache/buildbot/parisc64/rootfs.ext2,format=raw,if=none,id=d0 \
     -append "root=/dev/sda rootwait console=ttyS0,115200 " \
     -nographic -monitor null

- Not sure it if is worth mentioning: There may be hung task crashes in
   usb_start_wait_urb/usb_kill_urb during shutdown when booting from usb
   or when using an usb network interface. That happens with all emulations,
   though, and is not parisc specific.

Did you reported it upstream in the bug tracker?

No, because I have no idea if it is an emulation problem or a linux problem.
I never had the time to track it down. I just noticed that it seemed to be more
prevalent with 64-bit parisc especially if I boot from usb _and_ use a usb
network interface. In case you are interested to see how it looks like, here
are a couple of examples:

https://kerneltests.org/builders/qemu-riscv64-5.4/builds/46/steps/qemubuildcommand/logs/stdio
https://kerneltests.org/builders/qemu-parisc64-6.6/builds/1/steps/qemubuildcommand/logs/stdio

Ok, thanks!
But isn't that more or less expected, as the machine can't simply turn
off USB when root disc is on USB? E.g. otherwise it woulnd't find the shutdown
executables? Maybe just the warning should be disabled after shutdown?

Not sure about that, for a number of reasons: It doesn't happen all the time,
and it is more likely to happen if the system is under load. It also seems
to be associated with OHCI (I am currently running more tests to confirm),
and sometimes the failure is with the network interface. That suggests that
some race condition may be involved.

Ok, at least it should be looked at...

I confirmed that this is _only_ seen if the host system is under load. I have not
been able to reproduce the problem on a system which is idle beyond the qemu process.

Guenter