On Tue, Apr 26, 2022, Paolo Bonzini wrote: > On 3/28/22 20:15, Sean Christopherson wrote: > > > lookup_address_in_mm() walks the host page table as if it is a > > > sequence of_static_ memory chunks. This is clearly dangerous. > > Yeah, it's broken. The proper fix is do something like what perf uses, or maybe > > just genericize and reuse the code from commit 8af26be06272 > > ("perf/core: Fix arch_perf_get_page_size()). > > > > Indeed, KVM could use perf_get_pgtable_size(). The conversion from the > result of *_leaf_size() to level is basically (ctz(size) - 12) / 9. > > Alternatively, there are the three difference between perf_get_page_size() > and lookup_address_in_pgd(): > > * the *_offset_lockless() macros, which are unnecessary on x86 > > * READ_ONCE, which is important but in practice unlikely to make a > difference It can make a difference for this specific case. I can't find the bug/patch, but a year or two back there was a bug in a similar mm/ path where lack of READ_ONCE() led to deferencing garbage due re-reading an upper level entry. IIRC, it was a page promotion (to huge page) case, where the p*d_large() check came back false (saw the old value) and then p*d_offset() walked into the weeds because it used the new value (huge page offset). > * local_irq_{save,restore} around the walk > > > The last is the important one and it should be added to > lookup_address_in_pgd(). I don't think so. The issue is that, similar to adding a lockdep here, simply disabling IRQs is not sufficient to ensure the resolved pfn is valid. And again, like this case, disabling IRQs is not actually required when sufficient protections are in place, e.g. in KVM's page fault case, the mmu_notifier invalidate_start event must occur before the primary MMUs modifies its PTEs. In other words, disabling IRQs is both unnecessary and gives a false sense of security. I completely agree that lookup_address() and friends are unnecessarily fragile, but I think that attempting to harden them to fix this KVM bug will open a can of worms and end up delaying getting KVM fixed.