On 13/02/2024 13:13, David Hildenbrand wrote: > On 13.02.24 14:06, Ryan Roberts wrote: >> On 13/02/2024 12:19, David Hildenbrand wrote: >>> On 13.02.24 13:06, Ryan Roberts wrote: >>>> On 12/02/2024 20:38, Ryan Roberts wrote: >>>>> [...] >>>>> >>>>>>>>> +static inline bool mm_is_user(struct mm_struct *mm) >>>>>>>>> +{ >>>>>>>>> + /* >>>>>>>>> + * Don't attempt to apply the contig bit to kernel mappings, because >>>>>>>>> + * dynamically adding/removing the contig bit can cause page faults. >>>>>>>>> + * These racing faults are ok for user space, since they get >>>>>>>>> serialized >>>>>>>>> + * on the PTL. But kernel mappings can't tolerate faults. >>>>>>>>> + */ >>>>>>>>> + return mm != &init_mm; >>>>>>>>> +} >>>>>>>> >>>>>>>> We also have the efi_mm as a non-user mm, though I don't think we >>>>>>>> manipulate >>>>>>>> that while it is live, and I'm not sure if that needs any special handling. >>>>>>> >>>>>>> Well we never need this function in the hot (order-0 folio) path, so I >>>>>>> think I >>>>>>> could add a check for efi_mm here with performance implication. It's >>>>>>> probably >>>>>>> safest to explicitly exclude it? What do you think? >>>>>> >>>>>> Oops: This should have read "I think I could add a check for efi_mm here >>>>>> *without* performance implication" >>>>> >>>>> It turns out that efi_mm is only defined when CONFIG_EFI is enabled. I can do >>>>> this: >>>>> >>>>> return mm != &init_mm && (!IS_ENABLED(CONFIG_EFI) || mm != &efi_mm); >>>>> >>>>> Is that acceptable? This is my preference, but nothing else outside of efi >>>>> references this symbol currently. >>>>> >>>>> Or perhaps I can convince myself that its safe to treat efi_mm like userspace. >>>>> There are a couple of things that need to be garanteed for it to be safe: >>>>> >>>>> - The PFNs of present ptes either need to have an associated struct >>>>> page or >>>>> need to have the PTE_SPECIAL bit set (either pte_mkspecial() or >>>>> pte_mkdevmap()) >>>>> >>>>> - Live mappings must either be static (no changes that could cause >>>>> fold/unfold >>>>> while live) or the system must be able to tolerate a temporary fault >>>>> >>>>> Mark suggests efi_mm is not manipulated while live, so that meets the latter >>>>> requirement, but I'm not sure about the former? >>>> >>>> I've gone through all the efi code, and conclude that, as Mark suggests, the >>>> mappings are indeed static. And additionally, the ptes are populated using only >>>> the _private_ ptep API, so there is no issue here. As just discussed with Mark, >>>> my prefereence is to not make any changes to code, and just add a comment >>>> describing why efi_mm is safe. >>>> >>>> Details: >>>> >>>> * Registered with ptdump >>>> * ptep_get_lockless() >>>> * efi_create_mapping -> create_pgd_mapping … -> init_pte: >>>> * __ptep_get() >>>> * __set_pte() >>>> * efi_memattr_apply_permissions -> efi_set_mapping_permissions … -> >>>> set_permissions >>>> * __ptep_get() >>>> * __set_pte() >>> >>> Sound good. We could add some VM_WARN_ON if we ever get the efi_mm via the >>> "official" APIs. >> >> We could, but that would lead to the same linkage issue, which I'm trying to >> avoid in the first place: >> >> VM_WARN_ON(IS_ENABLED(CONFIG_EFI) && mm == efi_mm); >> >> This creates new source code dependencies, which I would rather avoid if >> possible. > > Just a thought, you could have a is_efi_mm() function that abstracts all that. > > diff --git a/include/linux/efi.h b/include/linux/efi.h > index c74f47711f0b..152f5fa66a2a 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -692,6 +692,15 @@ extern struct efi { > > extern struct mm_struct efi_mm; > > +static inline void is_efi_mm(struct mm_struct *mm) > +{ > +#ifdef CONFIG_EFI > + return mm == &efi_mm; > +#else > + return false; > +#endif > +} > + > static inline int > efi_guidcmp (efi_guid_t left, efi_guid_t right) > { > > That would definitely work, but in that case, I might as well just check for it in mm_is_user() (and personally I would change the name to mm_is_efi()): static inline bool mm_is_user(struct mm_struct *mm) { return mm != &init_mm && !mm_is_efi(mm); } Any objections?