Re: 64-bit userspace root file system for hppa64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/7/23 13:47, Helge Deller wrote:
(looping in Mark Cave-Ayland, since he did some work on qemu esp driver)

On 12/7/23 22:08, Guenter Roeck wrote:
Hi Helge,

On 12/6/23 13:43, Helge Deller wrote:
On 12/6/23 21:19, Guenter Roeck wrote:
On 12/6/23 09:00, Helge Deller wrote:
[ ... ]
Is it worth testing with multiple CPUs ? I can re-enable it and
check more closely if you think it makes sense. If so, what number
of CPUs would you recommend ?

I think 4 CPUs is realistic.
But I agree, that you probably see more issues.

Generally the assumption was, that the different caches on parisc
may trigger SMP issues, but given that those issues can be seen on
qemu, it indicates that there are generic SMP issues too.


Ok, I ran some tests overnight with 2-8 CPUs. Turns out the system is quite
stable,

cool!

with the exception of SCSI controllers. Some fail completely, others
rarely. Here is a quick summary:

- am53c974 fails with "Spurious irq, sreg=00", followed by "Aborting command"
   and a hung task crash.
- megasas and megasas-gen2 fail with
   "scsi host1: scsi scan: INQUIRY result too short (5), using 36"
   followed by
   "megaraid_sas 0000:00:04.0: Unknown command completed!"
   and a hung task crash
- mptsas1068 fails completely (no kernel log message seen)
- dc390 and lsi* report random "Spurious irq, sreg=00" messages and timeouts

I think none of those drivers have ever been tested
on physical hardware either.
So I'm astonished that it even worked that far :-)

I actually do have a dc390 board somewhere. I used it some time ago to improve
the emulation.

Do you have a physical hppa box too?


No, I used that on an old PC with "real" PCI slots.

Based on kernel sources, the "Spurious irq, sreg=%02x." error can only happen for the
am53c974 driver. Are you sure you see this message for dc390 and lsi* too?

am53c974 and dc390 use the same driver. lsi* doesn't, and doesn't have a problem
either. Sorry, I confused that with some old notes.

Either case, I think I found the problem. After handling an interrupt, the Linux
driver checks if another interrupt is pending. It does that by checking the
DMA_DONE bit in the DMA status register. If that bit is set, it re-enters the
interrupt handler. Problem with that is that the emulation sets DMA_DONE
prematurely, before it sets the command done bit in the interrupt status register
and before it sets the interrupt pending bit in the status register. As result,
DMA_DONE is set but IRQ_PENDING isn't, and the spurious interrupt is reported.
I fixed that up in my code and will test it for some time and with various
architectures before I send a patch.

Thanks for testing.
But I wonder if the Linux kernel driver needs (on physical hardware!) some more
cache flushing too. I see it uses dma_alloc_coherent(), but I don't see
dma_sync_single_for_device() or dma_sync_sg_for_cpu().
Those are needed for dma on hppa...


Ah, testing that is beyond my capabilities. All I know is that it worked fine on
an old PC running Linux.

Guenter





[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux