On 7/19/19 1:09 PM, Jon Masters wrote: > On 7/4/19 6:12 PM, Jon Masters wrote: > >> I think we have identified the root cause of the 32-bit builder issue. >> Many thanks to Paul and Peter for assistance in debugging. Here's my >> write-up, and we'll work with the vendor on a suitable mitigation to >> workaround any errata: >> >> https://medium.com/@jonmasters_84473/debugging-a-32-bit-fedora-arm-builder-issue-73295d7d673d > > The hardware vendor have reproduced what I believe to be an errata. > Meanwhile, I've made a test kernel that forces CONFIG_HIGHPTE to off: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=36328838 > > With this kernel, you still get LPAE but leaf level PTEs are not > allocated from high memory any longer. This is because I believe the > errata to be caused by stage 1 page table walks in the guest trapping to > stage 2 (hypervisor) for e.g. Access bit updates on the host. When those > occur, I believe there is a truncation of the guest IPA (guest memory) > address to 32-bits, but only for page table entry walks. Normal > translation faults I think are unaffected by this (TBC). > > Normally, we don't allocate PGDs (high level page table pieces) from > high memory (we allocate those from kernel memory caches) but we DO > allocate PTEs specifically from what might be high memory. Except when > we force CONFIG_HIGHPTE to off. The patch I'm using is attached. > > It's currently being tested. If it works, I'm curious for input on > temporarily carrying this in Fedora. In theory it means an LPAE system > could starve for PTEs if it has many many processes running, but in > practice I'm willing to bet LPAE is mostly used by Fedora for the 32-bit > builders and that few people would actually complain if we did this. This stayed up for 3+ days. Eventually, there were a couple of faults that I thought were a problem but it turns out that they weren't and just generated noise on the host kernel log. So it looks good to go with the hack that I proposed and that's going to be in Fedora's 5.2 kernel. Detail: The host saw a couple of exits due to speculative page walks in the guest. It hit my previous logic due to S1 PTW but this time the HPFAR was correct vs what we would expect due to the 32-bit range limit. [359524.820107] JCM: WARNING: Mismatched FIPA and PA translation detected! [359524.899630] JCM: Hyper faulting far: 0x40163000 [359524.955044] JCM: Guest faulting far: 0xb6dbbf48 (gfn: 0x4016) [359525.025047] JCM: Guest TTBCR: 0xb5023500, TTBR0: 0x4c99ca80 [359525.092963] JCM: Guest PGD address: 0x4c99ca90 [359525.147312] JCM: Guest PGD: 0x58bf7003 [359525.193319] JCM: Guest PMD address: 0x58bf7db0 [359525.247671] JCM: Guest PMD: 0x40163003 [359525.293678] JCM: Guest PTE address: 0x40163dd8 [359525.348030] JCM: Guest PTE: 0x420000367508fdf [359525.401338] JCM: Manually translated as: 0xb6dbbf48->0x367508000 [359525.474465] JCM: Faulting IPA page: 0x40163000 [359525.528814] JCM: Faulting PTE page: 0x40163000 [359525.583166] JCM: *** debugging data *** [359525.630215] JCM: FAR_EL2: 0xb6dbbf48 [359525.674133] JCM: HPFAR_EL2: 0x401630 [359525.718052] JCM: ESR_EL2: 0x8200008b [359525.761972] JCM: FAR_EL1: 0x4f2e50005b89b4 [359525.812149] JCM: ESR_EL1: 0x20b [359525.850852] JCM: *** debugging data *** [359525.897899] JCM: Fault occurred while performing S1 PTW -fixing [359525.969985] JCM: corrected fault_ipa: 0x40163000 [359526.026423] JCM: Corrected gfn: 0x4016 [359526.072427] JCM: handle access fault [359526.116347] JCM: ret: 0x1 You can see the FAR reported pfn 4016 and that's what we expected, so the above was just noise in my test kernel on the host monitoring a bit too carefully and not needing to actually fix anything this time. Jon. -- Computer Architect | Sent with my Fedora powered laptop _______________________________________________ kernel mailing list -- kernel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to kernel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/kernel@xxxxxxxxxxxxxxxxxxxxxxx