On Tue, 15 Mar 2022 15:18:35 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote: > Let's steal one bit from the offset. While at it, document the meaning > of bit 62 for swap ptes. You define _PAGE_SWP_EXCLUSIVE as _PAGE_LARGE, which is bit 52, and this is not part of the swap pte offset IIUC. So stealing any bit might actually not be necessary, see below. Also, bit 62 should be the soft dirty bit for normal PTEs, and this doesn't seem to be used for swap PTEs at all. But I might be missing some use case where softdirty also needs to be preserved in swap PTEs. > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > arch/s390/include/asm/pgtable.h | 37 ++++++++++++++++++++++++++------- > 1 file changed, 30 insertions(+), 7 deletions(-) > > diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h > index 008a6c856fa4..c182212a2b44 100644 > --- a/arch/s390/include/asm/pgtable.h > +++ b/arch/s390/include/asm/pgtable.h > @@ -181,6 +181,8 @@ static inline int is_module_addr(void *addr) > #define _PAGE_SOFT_DIRTY 0x000 > #endif > > +#define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE /* SW pte exclusive swap bit */ > + > /* Set of bits not changed in pte_modify */ > #define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_SPECIAL | _PAGE_DIRTY | \ > _PAGE_YOUNG | _PAGE_SOFT_DIRTY) > @@ -796,6 +798,24 @@ static inline int pmd_protnone(pmd_t pmd) > } > #endif > > +#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE > +static inline pte_t pte_swp_mkexclusive(pte_t pte) > +{ > + pte_val(pte) |= _PAGE_SWP_EXCLUSIVE; > + return pte; > +} > + > +static inline int pte_swp_exclusive(pte_t pte) > +{ > + return pte_val(pte) & _PAGE_SWP_EXCLUSIVE; > +} > + > +static inline pte_t pte_swp_clear_exclusive(pte_t pte) > +{ > + pte_val(pte) &= ~_PAGE_SWP_EXCLUSIVE; > + return pte; > +} > + > static inline int pte_soft_dirty(pte_t pte) > { > return pte_val(pte) & _PAGE_SOFT_DIRTY; > @@ -1675,16 +1695,19 @@ static inline int has_transparent_hugepage(void) > * information in the lowcore. > * Bits 54 and 63 are used to indicate the page type. > * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200 > - * This leaves the bits 0-51 and bits 56-62 to store type and offset. > - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51 > + * This leaves the bits 0-50 and bits 56-61 to store type and offset. > + * We use the 5 bits from 57-61 for the type and the 51 bits from 0-50 > * for the offset. > - * | offset |01100|type |00| > - * |0000000000111111111122222222223333333333444444444455|55555|55566|66| > - * |0123456789012345678901234567890123456789012345678901|23456|78901|23| > + * | offset |E|01100|type |S0| > + * |000000000011111111112222222222333333333344444444445|5|55555|55566|66| > + * |012345678901234567890123456789012345678901234567890|1|23456|78901|23| > + * > + * S (bit 62) is used for softdirty tracking. Unless there is some use for softdirty tracking in swap PTEs, I think this description does not belong here, to the swap PTE layout. > + * E (bit 51) is used to remember PG_anon_exclusive. It is bit 52, at least with this patch, so I guess this could all be done w/o stealing anything. That is, of course, only if it is allowed to use bit 52 in this case. The POP says bit 52 has to be 0, or else a "translation-specification exception" is recognized. However, I think it could be OK for PTEs marked as invalid, like it is the case for swap PTEs. The comment here says at the beginning: /* * 64 bit swap entry format: * A page-table entry has some bits we have to treat in a special way. * Bits 52 and bit 55 have to be zero, otherwise a specification * exception will occur instead of a page translation exception. The * specification exception has the bad habit not to store necessary * information in the lowcore. This would mean that it is not OK to have bit 52 not zero for swap PTEs. But if I read the POP correctly, all bits except for the DAT-protection would be ignored for invalid PTEs, so maybe this comment needs some update (for both bits 52 and also 55). Heiko might also have some more insight. Anyway, stealing bit 51 might still be an option, but then _PAGE_SWP_EXCLUSIVE would need to be defined appropriately.