On 9/14/22 09:42, Matthew Wilcox wrote: > On Wed, Sep 14, 2022 at 03:33:50PM +0900, Hyeonggon Yoo wrote: >> On Fri, Sep 09, 2022 at 11:16:51PM +0200, Vlastimil Babka wrote: >>> On 9/9/22 16:32, Hyeonggon Yoo wrote: >>>> On Fri, Sep 09, 2022 at 03:44:19PM +0200, Vlastimil Babka wrote: >>>>> On 9/9/22 13:05, Hyeonggon Yoo wrote: >>>>>>> ----8<---- >>>>>>> From d6f9fbb33b908eb8162cc1f6ce7f7c970d0f285f Mon Sep 17 00:00:00 2001 >>>>>>> From: Vlastimil Babka <vbabka@xxxxxxx> >>>>>>> Date: Fri, 9 Sep 2022 12:03:10 +0200 >>>>>>> Subject: [PATCH 2/3] mm/migrate: make isolate_movable_page() skip slab pages >>>>>>> >>>>>>> In the next commit we want to rearrange struct slab fields to allow a >>>>>>> larger rcu_head. Afterwards, the page->mapping field will overlap >>>>>>> with SLUB's "struct list_head slab_list", where the value of prev >>>>>>> pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. >>>>>>> Unfortunately the bit 1 being set can confuse PageMovable() to be a >>>>>>> false positive and cause a GPF as reported by lkp [1]. >>>>>>> >>>>>>> To fix this, make isolate_movable_page() skip pages with the PageSlab >>>>>>> flag set. This is a bit tricky as we need to add memory barriers to SLAB >>>>>>> and SLUB's page allocation and freeing, and their counterparts to >>>>>>> isolate_movable_page(). >>>>>> >>>>>> Hello, I just took a quick grasp, >>>>>> Is this approach okay with folio_test_anon()? >>>>> >>>>> Not if used on a completely random page as compaction scanners can, but >>>>> relies on those being first tested for PageLRU or coming from a page table >>>>> lookup etc. >>>>> Not ideal huh. Well I could improve also by switching 'next' and 'slabs' >>>>> field and relying on the fact that the value of LIST_POISON2 doesn't include >>>>> 0x1, just 0x2. >>>> >>>> What about swapping counters and freelist? >>>> freelist should be always aligned. >>> >>> Great suggestion, thanks! >>> >>> Had to deal with SLAB too as there was list_head.prev also aliasing >>> page->mapping. Wanted to use freelist as well, but turns out it's not >>> aligned, so had to use s_mem instead. >>> >>> The patch that isolate_movable_page() skip slab pages was thus dropped. The >>> result is in slab.git below and if nothing blows up, will restore it to -next >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=for-6.1/fit_rcu_head >> >> I realized that there is also relevant comment in >> include/linux/mm_types.h: >> >>> 62 * SLUB uses cmpxchg_double() to atomically update its freelist and counters. >>> 63 * That requires that freelist & counters in struct slab be adjacent and >>> 64 * double-word aligned. Because struct slab currently just reinterprets the >>> 65 * bits of struct page, we align all struct pages to double-word boundaries, >>> 66 * and ensure that 'freelist' is aligned within struct slab. >>> 67 */ >> >> Also we may add a comment, >> something like this? >> >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -79,6 +79,9 @@ struct page { >> * WARNING: bit 0 of the first word is used for PageTail(). That >> * means the other users of this union MUST NOT use the bit to >> * avoid collision and false-positive PageTail(). >> + * >> + * WARNING: lower two bits of third word is used for PAGE_MAPPING_FLAGS. >> + * using those bits can lead compaction code to general protection fault. > > I'm really not comfortable with adding that documentation. I feel the > compaction code should be fixed. Any suggestions how exactly, as using a true page flag for __PageMovable is off limits? :)