Re: 64-bit userspace root file system for hppa64

Guenter Roeck <linux@xxxxxxxxxxxx> · Fri, 8 Dec 2023 12:09:56 -0800

On 12/8/23 11:45, Mark Cave-Ayland wrote:
On 08/12/2023 19:26, Helge Deller wrote:

On 12/8/23 19:53, Mark Cave-Ayland wrote:
On 08/12/2023 14:58, Guenter Roeck wrote:

On 12/8/23 00:01, Mark Cave-Ayland wrote:
On 07/12/2023 21:47, Helge Deller wrote:

(looping in Mark Cave-Ayland, since he did some work on qemu esp driver)

Thanks for the ping!

On 12/7/23 22:08, Guenter Roeck wrote:
Hi Helge,

On 12/6/23 13:43, Helge Deller wrote:
On 12/6/23 21:19, Guenter Roeck wrote:
On 12/6/23 09:00, Helge Deller wrote:
[ ... ]
Is it worth testing with multiple CPUs ? I can re-enable it and
check more closely if you think it makes sense. If so, what number
of CPUs would you recommend ?

I think 4 CPUs is realistic.
But I agree, that you probably see more issues.

Generally the assumption was, that the different caches on parisc
may trigger SMP issues, but given that those issues can be seen on
qemu, it indicates that there are generic SMP issues too.


Ok, I ran some tests overnight with 2-8 CPUs. Turns out the system is quite
stable,

cool!

with the exception of SCSI controllers. Some fail completely, others
rarely. Here is a quick summary:

- am53c974 fails with "Spurious irq, sreg=00", followed by "Aborting command"
   and a hung task crash.
- megasas and megasas-gen2 fail with
   "scsi host1: scsi scan: INQUIRY result too short (5), using 36"
   followed by
   "megaraid_sas 0000:00:04.0: Unknown command completed!"
   and a hung task crash
- mptsas1068 fails completely (no kernel log message seen)
- dc390 and lsi* report random "Spurious irq, sreg=00" messages and timeouts

I think none of those drivers have ever been tested
on physical hardware either.
So I'm astonished that it even worked that far :-)

I actually do have a dc390 board somewhere. I used it some time ago to improve
the emulation.

Do you have a physical hppa box too?

Based on kernel sources, the "Spurious irq, sreg=%02x." error can only happen for the
am53c974 driver. Are you sure you see this message for dc390 and lsi* too?

am53c974 and dc390 use the same driver. lsi* doesn't, and doesn't have a problem
either. Sorry, I confused that with some old notes.

Either case, I think I found the problem. After handling an interrupt, the Linux
driver checks if another interrupt is pending. It does that by checking the
DMA_DONE bit in the DMA status register. If that bit is set, it re-enters the
interrupt handler. Problem with that is that the emulation sets DMA_DONE
prematurely, before it sets the command done bit in the interrupt status register
and before it sets the interrupt pending bit in the status register. As result,
DMA_DONE is set but IRQ_PENDING isn't, and the spurious interrupt is reported.
I fixed that up in my code and will test it for some time and with various
architectures before I send a patch.

I'm actually in the process of putting the finishing touches to a large rewrite of QEMU's core ESP emulation since there are a number of known issues with the existing version. In particular there are problems with the SCSI phase being set incorrectly after reading ESP_INTR and ESP_RSTAT's STAT_TC not being correct. Note that this is just the ESP core rather than the ESP PCI device.

If you are interested, I could try and find a few minutes to tidy it up a bit more and push a testing branch to Github?


Sure, I'll be happy to give your changes a try.

FWIW, the change I made to fix the spurious interrupt problem is

diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c
index 6794acaebc..f624398c55 100644
--- a/hw/scsi/esp-pci.c
+++ b/hw/scsi/esp-pci.c
@@ -286,9 +286,6 @@ static void esp_pci_dma_memory_rw(PCIESPState *pci, uint8_t *buf, int len,
      /* update status registers */
      pci->dma_regs[DMA_WBC] -= len;
      pci->dma_regs[DMA_WAC] += len;
-    if (pci->dma_regs[DMA_WBC] == 0) {
-        pci->dma_regs[DMA_STAT] |= DMA_STAT_DONE;
-    }
  }

I tested that with several platforms. There are no more spurious interrupts
after that change, and no other errors either.

I suspect that this is papering over the real issue, since it appears the code being removed sets the DMA completion bit when then the PCI DMA transfer counter reaches zero.

Regarding TC after reading the interrupt register, I carry the following
patch locally.

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 9b11d8c573..f0cd8705a7 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -986,7 +986,7 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
           */
          val = s->rregs[ESP_RINTR];
          s->rregs[ESP_RINTR] = 0;
-        s->rregs[ESP_RSTAT] &= ~STAT_TC;
+        // s->rregs[ESP_RSTAT] &= ~STAT_TC;

The comment above that code says "Clear sequence step, interrupt register
and all status bits except TC", which is quite the opposite of what the code
is doing because it clears TC and nothing else. I never spent the time
trying to figure out how to fix that properly; clearing the other bits
like the comment suggests doesn't work (STAT_INT needs to be set for
esp_lower_irq() to work, and clearing the other bits results in transfer
failures).

Yeah that's one of the many bugs which should be fixed by my latest
series. I've pushed the current version of my branch with the ESP
rewrite to https://github.com/mcayland/qemu/tree/esp-rework-testing
if you would both like to give it a test.

Tried it with qemu-hppa:

[    1.062381] sym53c8xx 0000:00:00.0: enabling SERR and PARITY (0107 -> 0147)
[    1.066381] sym0: <895a> rev 0x0 at pci 0000:00:00.0 irq 66
[    1.073919] sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
[    1.080618] sym0: SCSI BUS has been reset.
[    1.085325] scsi host0: sym-2.2.3
[    4.257547] am53c974 0000:00:04.0: enabling SERR and PARITY (0107 -> 0147)
[    4.917824] am53c974 0000:00:04.0: esp0: regs[(ptrval):(ptrval)] irq[70]
[    4.918704] am53c974 0000:00:04.0: esp0: is a AM53C974, 40 MHz (ccf=0), SCSI ID 15
[    8.010626] scsi host1: esp
[    8.026345] scsi 1:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    8.032066] scsi target1:0:0: Beginning Domain Validation
[    8.043254] scsi target1:0:0: Domain Validation skipping write tests
[    8.044284] scsi target1:0:0: Ending Domain Validation
[    8.094991] megasas: 07.727.03.00-rc1
[    8.097635] mpt3sas version 43.100.00.00 loaded
[    8.109417] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[    8.123681] sd 1:0:0:0: Power-on or device reset occurred
[    8.134707] sd 1:0:0:0: [sda] 209715200 512-byte logical blocks: (107 GB/100 GiB)
[    8.140043] sd 1:0:0:0: [sda] Write Protect is off
[    8.144759] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    8.205316]  sda: sda1 sda2 sda3 < sda5 sda6 >
[    8.222763] sd 1:0:0:0: [sda] Attached SCSI disk
[    8.231170] sd 1:0:0:0: Attached scsi generic sg0 type 0
[    8.237107] LASI 82596 driver - Revision: 1.30
[    8.238440] Fusion MPT base driver 3.04.20
[    8.239024] Copyright (c) 1999-2008 LSI Corporation
[    8.240965] Fusion MPT SPI Host driver 3.04.20
[    8.243040] Fusion MPT SAS Host driver 3.04.20
[    8.245172] Fusion MPT misc device (ioctl) driver 3.04.20
[    8.247849] mptctl: Registered with Fusion MPT base driver
[    8.248791] mptctl: /dev/mptctl @ (major,minor=10,220)
[    8.258484] HP SDC: No SDC found.
[    8.271496] rtc-generic rtc-generic: registered as rtc0
[    8.274606] rtc-generic rtc-generic: setting system clock to 2023-12-08T19:25:10 UTC (1702063510)
[    8.278926] device-mapper: uevent: version 1.0.3
[    8.284893] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@xxxxxxxxxx
[    8.288890] hid: raw HID events driver (C) Jiri Kosina
[    8.302272] usbcore: registered new interface driver usbhid
[    8.303494] usbhid: USB HID core driver
[    8.308155] NET: Registered PF_INET6 protocol family
[    8.337076] Segment Routing with IPv6
[    8.338476] In-situ OAM (IOAM) with IPv6
[    8.340887] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    8.351957] NET: Registered PF_PACKET protocol family
[    8.596153] EXT4-fs (sda5): mounted filesystem f035d934-31b6-430e-b23d-a818f9baaf2e ro with ordered data mode. Quota mode: none.
[    8.599184] VFS: Mounted root (ext4 filesystem) readonly on device 8:5.
[    8.609270] devtmpfs: mounted
[    8.679666] Freeing unused kernel image (initmem) memory: 3072K
[    8.680679] Write protected read-only-after-init data: 2k
[    8.681338] Run /sbin/init as init process
[    8.731576] EXT4-fs error (device sda5): ext4_lookup:1855: inode #787975: comm swapper/0: iget: checksum invalid
[    8.736664] scsi host1: Spurious irq, sreg=10.
[    8.760106] Starting init: /sbin/init exists but couldn't execute it (error -67)
[    8.760773] Run /etc/init as init process
[    8.768268] Run /bin/init as init process
[    8.775050] Run /bin/sh as init process
[    8.777917] EXT4-fs error (device sda5): ext4_lookup:1855: inode #787980: comm swapper/0: iget: checksum invalid
[    8.779882] scsi host1: Spurious irq, sreg=10.
[    8.780532] scsi host1: Spurious irq, sreg=13.
[    8.781094] Starting init: /bin/sh exists but couldn't execute it (error -67)
[    8.781934] Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
[    8.782740] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc4-32bit #2434
[    8.782740] Hardware name: 9000/785/C3700
[    8.782740] Backtrace:
[    8.782740]  [<104080f0>] show_stack+0x54/0x6c
[    8.782740]  [<10c09498>] dump_stack_lvl+0x58/0x7c
[    8.782740]  [<10c094d8>] dump_stack+0x1c/0x2c
[    8.782740]  [<10bf5698>] panic+0x130/0x2d4
[    8.782740]  [<10c0a024>] kernel_init+0x14c/0x150
[    8.782740]  [<1040201c>] ret_from_kernel_thread+0x1c/0x24

Ah that's a shame, I was really hoping that would solve the issue. Unless there is something amiss with the esp-pci device? I haven't really spent any time looking at the PCI DMA implementation.


The "technical manual" for AM53C974 from AMD states that an interrupt is supposed
to be generated when the DMA DONE bit is set. The esp-pci code does not do that.

Guenter