On 2019-05-31 8:23 a.m., John David Anglin wrote: > On 2019-05-30 4:59 p.m., Sven Schnelle wrote: >> Hi, >> >> On Thu, May 30, 2019 at 09:55:43PM +0200, Sven Schnelle wrote: >>> Hi, >>> >>> On Wed, May 29, 2019 at 04:15:03PM +0200, Helge Deller wrote: >>>>>> Exactly. And as: >>>>>> >>>>>> a) All C3600 PDC versions clear the NP bit >>>>>> b) All C37XX/J5000 PDC version set the NP bit >>>>>> >>>>>> i don't think there's some bug in the PDC. I would guess that the patch Carlo >>>>>> reported to fix issues is just hiding the real problem. Would be interesting >>>>>> to run Carlo's Test on a C37XX. >>>>> Probably, hardware cache coherent I/O is not implemented correctly for Elroy based systems. >>>>> https://www.hpl.hp.com/hpjournal/96feb/feb96a6.pdf >>>>> Does it work on C360? >>>> I slowly start to get confused... >>>> Just thinking about another possibility: Maybe we can rely on the value of the >>>> NP iopdir_fdc bit only on machines with >= PA8700 CPUs? >>>> For older machines (which would need opdir_fdc) HP-UX or other operating >>>> systems decides on the found CPU. >>>> This would explain why it's not set on Carlo's C3600, and if Sven's C240 >>>> (with a PA8200 CPU) doesn't has the bit set too, then this could explain this theory. >>> I just re-tested my kexec branch, and the HPMC i was seeing when kexec'ing a new >>> kernel on my J5000 is now gone with Helge's patch. J5000 also has PCX-W. It was >>> only triggered when i had SMP enabled, but this is somehow not suprising given >>> the fact that a cache flush was missing. >> Looks like i'm also confused now. My J5000 crashed with the kexec stuff again. >> It's much less than before, only 1 out of 10 times. >> >> The patch does: >> >> if ((cond & ALT_COND_NO_IOC_FDC) && >> ((boot_cpu_data.cpu_type < pcxw) || >> (boot_cpu_data.cpu_type == pcxw_) || >> (boot_cpu_data.pdc.capabilities & PDC_MODEL_IOPDIR_FDC))) >> continue; >> >> So there should be no change for PCX-W and my statement that this fixes anything >> on my J5000 is wrong. I think i'll disable the patching and see whether the problem >> disappears. > Is it possible that we are running in a mode where the cache/TLB does not issue coherent > operations? There is a PDC_CACHE call to set the coherence state. I checked the machines that I have and they all have coherent caches and TLBs. I think flush and sync are required on all machines with write-back caches. This makes write visible to I/O adapter (memory). The c3600 has a write-back data cache. See "PDC Procedures" page 4-21. This might be affected by the TLB U bit. Possibly, the U bit is not set for pages in the I/O address region (IO-PDIR) and we need flush/sync as a result. -- John David Anglin dave.anglin@xxxxxxxx