On Wed, 21 Aug 2019, Thomas Gleixner wrote: > On Wed, 21 Aug 2019, Song Liu wrote: > > > On Aug 20, 2019, at 1:23 PM, Song Liu <songliubraving@xxxxxx> wrote: > > > > > > Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr. > > > This behavior changes after the 32-bit support: pti_clone_pgtable() > > > increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by > > > PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because > > > addr may not be PUD_SIZE/PMD_SIZE aligned. > > > > > > Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE > > > in these two cases. > > > > After poking around more, I found the following doesn't really make > > sense. > > I'm glad you figured that out yourself. Was about to write up something to > that effect. > > Still interesting questions remain: > > 1) How did you end up feeding an unaligned address into that which points > to a 0 PUD? > > 2) Is this related to Facebook specific changes and unlikely to affect any > regular kernel? I can't come up with a way to trigger that in mainline > > 3) As this is a user page table and the missing mapping is related to > mappings required by PTI, how is the machine going in/out of user > space in the first place? Or did I just trip over what you called > nonsense? And just because this ended in silence I looked at it myself after Peter told me that this was on a kernel with PTI disabled. Aside of that my built in distrust for debug war stories combined with fairy tale changelogs triggered my curiousity anyway. So that cannot be a kernel with PTI disabled compile time because in that case the functions are not available, unless it's FB hackery which I do not care about. So the only way this can be reached is when PTI is configured but disabled at boot time via pti=off or nopti. For some silly reason and that goes back to before the 32bit support and Joern did not notice either when he introduced pti_finalize() this results in the following functions being called unconditionallY: pti_clone_entry_text() pti_clone_kernel_text() pti_clone_kernel_text() was called unconditionally before the 32bit support already and the only reason why it did not have any effect in that situation is that it invokes pti_kernel_image_global_ok() and that returns false when PTI is disabled on the kernel command line. Oh well. It obviously never checked whether X86_FEATURE_PTI was disabled or enabled in the first place. Now 32bit moved that around into pti_finalize() and added the call to pti_clone_entry_text() which just runs unconditionally. Now there is still the interesting question why this matters. The to be cloned page table entries are mapped and the start address even if unaligned never points to something unmapped. The unmapped case only covers holes and holes are obviously aligned at the upper levels even if the address of the hole is unaligned. So colour me still confused what's wrong there but the proper fix is the trivial: --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -666,6 +666,8 @@ void __init pti_init(void) */ void pti_finalize(void) { + if (!boot_cpu_has(X86_FEATURE_PTI)) + return; /* * We need to clone everything (again) that maps parts of the * kernel image. Hmm? I'm going to look whether that makes any difference in the page tables tomorrow with brain awake, but I wanted to share this before the .us vanishes into the weekend :) Thanks, tglx