On 08/04/2024 11:24, David Hildenbrand wrote: > On 08.04.24 12:07, Ryan Roberts wrote: >> On 08/04/2024 10:43, David Hildenbrand wrote: >>> >>>>>> + >>>>>> +/** >>>>>> + * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries >>>>>> + * @start_ptep: Page table pointer for the first entry. >>>>>> + * @max_nr: The maximum number of table entries to consider. >>>>>> + * @entry: Swap entry recovered from the first table entry. >>>>>> + * >>>>>> + * Detect a batch of contiguous swap entries: consecutive (non-present) PTEs >>>>>> + * containing swap entries all with consecutive offsets and targeting the >>>>>> same >>>>>> + * swap type. >>>>>> + * >>>>> >>>>> Likely you should document that any swp pte bits are ignored? () >>>> >>>> Sorry I don't understand this comment. I thought any non-none, non-present PTE >>>> was always considered to contain only a "swap entry" and a swap entry consists >>>> of a "type" and an "offset" only. (and its a special "non-swap" swap entry if >>>> type > SOME_CONSTANT) Are you saying there are additional fields in the PTE >>>> that >>>> are not part of the swap entry? >>> >>> >>> pte_swp_soft_dirty() >>> pte_swp_clear_exclusive() >>> pte_swp_uffd_wp() >>> >>> Are PTE bits used for swp PTE. >> >> Ahh wow. mind blown. Looks like a massive hack... why not store them in the >> arch-independent swap entry, rather than have them squat independently in the >> PTE? > > I think that was discussed at some point, but it not only requires quite some > churn to change it (all that swp entry code is a mess), these bits are > conceptually really per PTE and not something you would want to pass into actual > swap handling code that couldn't care less about all of these. > > I looked at this when I added SWP exclusive, but accidentally losing the SWP > exclusive marker when converting back and forth made me go the PTE route instead. > > Then, the available PTE bits are a bit scattered on some architectures, and > converting entry<->PTE gets even uglier if we don't want to "lose" available bits. > > Probably the whole "unsigned long swp_entry" stuff should be replaced by a > proper struct where we could more easily add flags and have the arch code handle > the conversion to the PTE just once. The arch-specific swp_entry stuff is > another nasty thing IMHO. Yep understood. I'll file this under "there be dragons". Thanks for the explanation. > >> >> OK, my implementation is buggy. I'll re-spin to fix this. >> >> >>> >>> There is also dirty/young for migration entries, but that's not of a concern >>> here, because we stop for non_swap_entry(). >> >> Looks like these are part of the offset field in the arch-independent swap entry >> - much cleaner ;-). > > Note that it only applies to migration entries, and only when we have some spare > bits due to PFN < offset. Yep got it. Thanks!