On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote: > On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > Umm... What about the semantics of get_user() of unmapped address? > > Some architectures do quiet EFAULT; some (including alpha) hit > > the sucker with SIGBUS, no matter what. > > I think we should strive to just make this all common. > > The reason alpha is different is almost certainly not intentional, but > a combination of "pure accident" and "nobody actually cares". BTW, speaking of alpha page faults - maybe I'm misreading the manual, but it seems to imply that interrupts are *not* disabled when entering page fault handler: Table 24–2 Entry Point Address Registers Entry Point Value in a0 Value in a1 Value in a2 PS<IPL> entArith Exception summary Register mask UNPREDICTABLE Unchanged entIF Fault or trap type code UNPREDICTABLE UNPREDICTABLE Unchanged entInt Interrupt type Vector Interrupt parameter Priority of interrupt entMM VA MMCSR Cause Unchanged entSys p0 p1 p2 Unchanged entUna VA Opcode Src/Dst Unchanged So there's nothing to prevent an interrupt hitting just as we reach entMM, with interrupt handler stepping on a vmalloc'ed area and triggering another page fault. If that is correct, this /* Synchronize this task's top level page-table with the "reference" page table from init. */ long index = pgd_index(address); pgd_t *pgd, *pgd_k; pgd = current->active_mm->pgd + index; pgd_k = swapper_pg_dir + index; if (!pgd_present(*pgd) && pgd_present(*pgd_k)) { pgd_val(*pgd) = pgd_val(*pgd_k); return; } goto no_context; is not just missing local_irq_save()/local_irq_restore() around that fragment - if it finds pgd already present, it needs to check pte before deciding to proceed to no_context. Suppose we access vmalloc area and corresponding pgd is not present in current->active_mm. Just as we get to entMM, an interrupt arrives and proceeds to access something covered by the same pgd. OK, current->active_mm is still not present, we get another page fault and do_page_fault() gets to the quoted code. pgd is copied from the swapper_pg_dir, do_page_fault() returns and we get back to the instruction in interrupt handler that had triggered the second #PF. This time around it succeeds. Once the interrupt handler completes we are back to entMM. Once *that* gets to do_page_fault() we hit the quoted code again. Only this time around pgd *is* present and instead of returning we get to no_context. And since it's been a normal access to vmalloc'ed memory, there's nothing to be found in exception table. Oops... AFAICS, one way to deal with that is to treat (unlikely) pgd_present(*pgd) as "get to pte, return if it looks legitimate, proceed to no_context if it isn't". Other bugs in the same area: * we ought to compare address with VMALLOC_START, not TASK_SIZE. * we ought to do that *before* checking for kernel threads/pagefault_disable() being in effect. Wait a minute - pgd_present() on alpha has become constant 1 since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup" So that thing had been completely broken for 3 years and nobody had noticed. And that's really completely broken - it stopped copying top-level entries since that commit. That's also not hard to fix, but... * CONFIG_ALPHA_LARGE_VMALLOC had been racy all along * it had very limited use (need for >8Gb of vmalloc space) * it had stopped working in late 2019 and nobody cared. How about removing that kludge? Richard?