This patch series introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors [1]. This series does not bloat anything for zsmalloc and no functional change is intended (except for using zpdesc and folios). In the near future, the removal of page->index from struct page [2] will be addressed and the project also depends on this patch series. I think this series is now ready to be included in mm-unstable if there's no objection. Sergey thankfully added Reviewed-by and Tested-by tags on v8. But as I updated the patchset, could you please explicitly add them for v9 as well? A range-diff output is included at the end of this cover letter to help review. Thanks to everyone got involved in this series, especially, Alex who's been pushing it forward this year. v8: https://lore.kernel.org/linux-mm/20241205175000.3187069-1-willy@xxxxxxxxxxxxx [1] https://lore.kernel.org/linux-mm/ZvRKzKizOfEWBtJp@xxxxxxxxxxxxxxxxxxxx [2] https://lore.kernel.org/linux-mm/Z09hOy-UY9KC8WMb@xxxxxxxxxxxxxxxxxxxx v8 -> v9: Functionally very little change and most of them are comment/changelog updates. - (patch 1) Added comments for basic zpdesc helper functions, some bits copied from struct slab. - (patch 4) Changed 'f_zpdesc->next = NULL' to 'f_zpdesc->handle = 0' as f_zpdesc here is for a huge zspage. - (patch 17) Fixed a mistake in a previous rebase from v6 to v7. - (page 11, 19) Changed reset_zpdesc() to use struct page for robustness against re-organizing zpdesc fields. - Dropped patch 20 in v8 as it does not make re-implementing zsdesc management easier in the glorious future. we can just re-implement the whole reset_zpdesc(). - Dropped patch 21 in v8 and folded some comments of the patch into patch 2 that introduces zpdesc_{un,}lock(). The rest of the changes are changelog/comment cleanups. Cheers, Hyeonggon Alex Shi (7): mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage() mm/zsmalloc: convert create_page_chain() and its users to use zpdesc mm/zsmalloc: convert reset_page to reset_zpdesc mm/zsmalloc: convert SetZsPageMovable and remove unused funcs mm/zsmalloc: convert get/set_first_obj_offset() to take zpdesc mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() Hyeonggon Yoo (11): mm/zsmalloc: convert __zs_map_object/__zs_unmap_object to use zpdesc mm/zsmalloc: add and use pfn/zpdesc seeking funcs mm/zsmalloc: convert obj_malloc() to use zpdesc mm/zsmalloc: convert obj_allocated() and related helpers to use zpdesc mm/zsmalloc: convert init_zspage() to use zpdesc mm/zsmalloc: convert obj_to_page() and zs_free() to use zpdesc mm/zsmalloc: add two helpers for zs_page_migrate() and make it use zpdesc mm/zsmalloc: convert __free_zspage() to use zpdesc mm/zsmalloc: convert location_to_obj() to take zpdesc mm/zsmalloc: convert migrate_zspage() to use zpdesc mm/zsmalloc: convert get_zspage() to take zpdesc mm/zpdesc.h | 182 +++++++++++++++++++++ mm/zsmalloc.c | 436 ++++++++++++++++++++++++++------------------------ 2 files changed, 408 insertions(+), 210 deletions(-) create mode 100644 mm/zpdesc.h -- 2.43.5 For ease of review, here I add range-diff output showing differences between v8 and v9: $ git range-diff zpdesc-v8...zpdesc-v9 1: 3d74794250ab ! 1: 9809a405a425 mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool @@ Metadata ## Commit message ## mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool - The 1st patch introduces new memory descriptor zpdesc and rename - zspage.first_page to zspage.first_zpdesc, no functional change. + The 1st patch introduces new memory descriptor zpdesc and renames + zspage.first_page to zspage.first_zpdesc, with no functional change. - We removed PG_owner_priv_1 since it was moved to zspage after - commit a41ec880aa7b ("zsmalloc: move huge compressed obj from - page to zspage"). + We removed the comment about PG_owner_priv_1 since it is no longer used + after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page + to zspage"). + [42.hyeyoo: rework comments a little bit] Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zpdesc.h (new) ## @@ @@ mm/zpdesc.h (new) +#define __MM_ZPDESC_H__ + +/* -+ * struct zpdesc - Memory descriptor for zpool memory -+ * @flags: Page flags, mostly unused. -+ * @lru: Indirectly used by page migration -+ * @movable_ops: Used by page migration -+ * @next: Next zpdesc in a zspage in zsmalloc zpool -+ * @handle: For huge zspage in zsmalloc zpool -+ * @zspage: Points to the zspage this zpdesc is a part of -+ * @first_obj_offset: First object offset in zsmalloc zpool -+ * @_refcount: Indirectly used by page migration -+ * @memcg_data: Memory Control Group data. ++ * struct zpdesc - Memory descriptor for zpool memory. ++ * @flags: Page flags, mostly unused by zsmalloc. ++ * @lru: Indirectly used by page migration. ++ * @movable_ops: Used by page migration. ++ * @next: Next zpdesc in a zspage in zsmalloc zpool. ++ * @handle: For huge zspage in zsmalloc zpool. ++ * @zspage: Points to the zspage this zpdesc is a part of. ++ * @first_obj_offset: First object offset in zsmalloc zpool. ++ * @_refcount: The number of references to this zpdesc. + * + * This struct overlays struct page for now. Do not modify without a good -+ * understanding of the issues. In particular, do not expand into -+ * the overlap with memcg_data. ++ * understanding of the issues. In particular, do not expand into the overlap ++ * with memcg_data. + * + * Page flags used: -+ * * PG_private identifies the first component page -+ * * PG_locked is used by page migration code ++ * * PG_private identifies the first component page. ++ * * PG_locked is used by page migration code. + */ +struct zpdesc { + unsigned long flags; @@ mm/zpdesc.h (new) + unsigned long handle; + }; + struct zspage *zspage; ++ /* ++ * Only the lower 24 bits are available for offset, limiting a page ++ * to 16 MiB. The upper 8 bits are reserved for PGTY_zsmalloc. ++ * ++ * Do not access this field directly. ++ * Instead, use {get,set}_first_obj_offset() helpers. ++ */ + unsigned int first_obj_offset; + atomic_t _refcount; +}; @@ mm/zpdesc.h (new) +#undef ZPDESC_MATCH +static_assert(sizeof(struct zpdesc) <= sizeof(struct page)); + ++/* ++ * zpdesc_page - The first struct page allocated for a zpdesc ++ * @zp: The zpdesc. ++ * ++ * A convenience wrapper for converting zpdesc to the first struct page of the ++ * underlying folio, to communicate with code not yet converted to folio or ++ * struct zpdesc. ++ * ++ */ +#define zpdesc_page(zp) (_Generic((zp), \ + const struct zpdesc *: (const struct page *)(zp), \ + struct zpdesc *: (struct page *)(zp))) + -+/* Using folio conversion to skip compound_head checking */ ++/** ++ * zpdesc_folio - The folio allocated for a zpdesc ++ * @zpdesc: The zpdesc. ++ * ++ * Zpdescs are descriptors for zpool memory. The zpool memory itself is ++ * allocated as folios that contain the zpool objects, and zpdesc uses specific ++ * fields in the first struct page of the folio - those fields are now accessed ++ * by struct zpdesc. ++ * ++ * It is occasionally necessary convert to back to a folio in order to ++ * communicate with the rest of the mm. Please use this helper function ++ * instead of casting yourself, as the implementation may change in the future. ++ */ +#define zpdesc_folio(zp) (_Generic((zp), \ + const struct zpdesc *: (const struct folio *)(zp), \ + struct zpdesc *: (struct folio *)(zp))) -+ ++/** ++ * page_zpdesc - Converts from first struct page to zpdesc. ++ * @p: The first (either head of compound or single) page of zpdesc. ++ * ++ * A temporary wrapper to convert struct page to struct zpdesc in situations ++ * where we know the page is the compound head, or single order-0 page. ++ * ++ * Long-term ideally everything would work with struct zpdesc directly or go ++ * through folio to struct zpdesc. ++ * ++ * Return: The zpdesc which contains this page ++ */ +#define page_zpdesc(p) (_Generic((p), \ + const struct page *: (const struct zpdesc *)(p), \ + struct page *: (struct zpdesc *)(p))) @@ mm/zpdesc.h (new) +#endif ## mm/zsmalloc.c ## +@@ + * Released under the terms of GNU General Public License Version 2.0 + */ + +-/* +- * Following is how we use various fields and flags of underlying +- * struct page(s) to form a zspage. +- * +- * Usage of struct page fields: +- * page->private: points to zspage +- * page->index: links together all component pages of a zspage +- * For the huge page, this is always 0, so we use this field +- * to store handle. +- * page->page_type: PGTY_zsmalloc, lower 24 bits locate the first object +- * offset in a subpage of a zspage +- * +- * Usage of struct page flags: +- * PG_private: identifies the first component page +- * PG_owner_priv_1: identifies the huge component page +- * +- */ +- + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + + /* @@ #include <linux/pagemap.h> #include <linux/fs.h> @@ mm/zsmalloc.c: static void create_page_chain(struct size_class *class, struct zs if (unlikely(class->objs_per_zspage == 1 && class->pages_per_zspage == 1)) @@ mm/zsmalloc.c: static unsigned long obj_malloc(struct zs_pool *pool, + /* record handle in the header of allocated chunk */ link->handle = handle | OBJ_ALLOCATED_TAG; else - /* record handle to page->index */ +- /* record handle to page->index */ - zspage->first_page->index = handle | OBJ_ALLOCATED_TAG; + zspage->first_zpdesc->handle = handle | OBJ_ALLOCATED_TAG; 2: d39d4fb6ce47 ! 2: 213aacce3c28 mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage() @@ Metadata ## Commit message ## mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage() - To use zpdesc in trylock_zspage()/lock_zspage() funcs, we add couple of helpers: - zpdesc_lock()/zpdesc_unlock()/zpdesc_trylock()/zpdesc_wait_locked() and - zpdesc_get()/zpdesc_put() for this purpose. + Convert trylock_zspage() and lock_zspage() to use zpdesc. To achieve + that, introduce a couple of helper functions: + - zpdesc_lock() + - zpdesc_unlock() + - zpdesc_trylock() + - zpdesc_wait_locked() + - zpdesc_get() + - zpdesc_put() - Here we use the folio series func in guts for 2 reasons, one zswap.zpool - only get single page, and use folio could save some compound_head checking; - two, folio_put could bypass devmap checking that we don't need. + Here we use the folio version of functions for 2 reasons. + First, zswap.zpool currently only uses order-0 pages and using folio + could save some compound_head checks. Second, folio_put could bypass + devmap checking that we don't need. BTW, thanks Intel LKP found a build warning on the patch. Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zpdesc.h ## @@ mm/zpdesc.h: static_assert(sizeof(struct zpdesc) <= sizeof(struct page)); 3: eac31b98eccb = 3: bcc63dd123f4 mm/zsmalloc: convert __zs_map_object/__zs_unmap_object to use zpdesc 4: 88ff6685943a ! 4: 44faeff2ab83 mm/zsmalloc: add and use pfn/zpdesc seeking funcs @@ mm/zsmalloc.c: static void obj_free(int class_size, unsigned long obj) link->next = get_freeobj(zspage) << OBJ_TAG_BITS; else - f_page->index = 0; -+ f_zpdesc->next = NULL; ++ f_zpdesc->handle = 0; set_freeobj(zspage, f_objidx); kunmap_local(vaddr); 5: 9c9e34caf1dc = 5: c907377a9f73 mm/zsmalloc: convert obj_malloc() to use zpdesc 6: 74470439e747 ! 6: 561c74077136 mm/zsmalloc: convert create_page_chain() and its users to use zpdesc @@ Metadata ## Commit message ## mm/zsmalloc: convert create_page_chain() and its users to use zpdesc - Introduce a few helper functions for conversion to convert create_page_chain() - to use zpdesc, then use zpdesc in replace_sub_page() too. + Introduce a few helper functions for conversion to convert + create_page_chain() to use zpdesc, then use zpdesc in replace_sub_page(). Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zpdesc.h ## @@ mm/zpdesc.h: static inline struct zpdesc *pfn_zpdesc(unsigned long pfn) 7: 999ce777e513 = 7: 5b6bb539edac mm/zsmalloc: convert obj_allocated() and related helpers to use zpdesc 8: 42cc3fe825bc = 8: 219b638a95ac mm/zsmalloc: convert init_zspage() to use zpdesc 9: c2fea65391d9 = 9: ad4cd88fb89c mm/zsmalloc: convert obj_to_page() and zs_free() to use zpdesc 10: c23d24a549dc ! 10: 931f0c1fdff8 mm/zsmalloc: add zpdesc_is_isolated()/zpdesc_zone() helper for zs_page_migrate() @@ Metadata Author: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## Commit message ## - mm/zsmalloc: add zpdesc_is_isolated()/zpdesc_zone() helper for zs_page_migrate() + mm/zsmalloc: add two helpers for zs_page_migrate() and make it use zpdesc To convert page to zpdesc in zs_page_migrate(), we added zpdesc_is_isolated()/zpdesc_zone() helpers. No functional change. 11: f0fdf4f127a4 ! 11: 3a37599de2a1 mm/zsmalloc: rename reset_page to reset_zpdesc and use zpdesc in it @@ Metadata Author: Alex Shi <alexs@xxxxxxxxxx> ## Commit message ## - mm/zsmalloc: rename reset_page to reset_zpdesc and use zpdesc in it + mm/zsmalloc: convert reset_page to reset_zpdesc zpdesc.zspage matches with page.private, zpdesc.next matches with page.index. They will be reset in reset_page() which is called prior to free base pages of a zspage. - Use zpdesc to replace page struct and rename it to reset_zpdesc(), few - page helper still left since they are used too widely. + Since the fields that need to be initialized are independent of the + order in struct zpdesc, Keep it to use struct page to ensure robustness + against potential rearrangements of struct zpdesc fields in the future. + + [42.hyeyoo: keep reset_zpdesc() to use struct page fields] Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zsmalloc.c ## @@ mm/zsmalloc.c: static inline bool obj_allocated(struct zpdesc *zpdesc, void *obj, @@ mm/zsmalloc.c: static inline bool obj_allocated(struct zpdesc *zpdesc, void *obj + __ClearPageMovable(page); ClearPagePrivate(page); -- set_page_private(page, 0); -- page->index = 0; -+ zpdesc->zspage = NULL; -+ zpdesc->next = NULL; - __ClearPageZsmalloc(page); - } - + set_page_private(page, 0); @@ mm/zsmalloc.c: static void __free_zspage(struct zs_pool *pool, struct size_class *class, do { VM_BUG_ON_PAGE(!PageLocked(page), page); 12: 889db23882fb = 12: 26c059dc7680 mm/zsmalloc: convert __free_zspage() to use zpdesc 13: 21398ee44728 = 13: 75da524b90b6 mm/zsmalloc: convert location_to_obj() to take zpdesc 14: fcbdb848eafe = 14: ffbf4cdbde74 mm/zsmalloc: convert migrate_zspage() to use zpdesc 15: abc171445571 = 15: c78de6e45dd4 mm/zsmalloc: convert get_zspage() to take zpdesc 16: fc6e6df18de6 ! 16: a409db41562c mm/zsmalloc: convert SetZsPageMovable and remove unused funcs @@ Commit message Originally-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zsmalloc.c ## @@ mm/zsmalloc.c: static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = { 17: e3e1b9ba2739 ! 17: a0046eec7921 mm/zsmalloc: convert get/set_first_obj_offset() to take zpdesc @@ Commit message Now that all users of get/set_first_obj_offset() are converted to use zpdesc, convert them to take zpdesc. - Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zsmalloc.c ## @@ mm/zsmalloc.c: static struct zpdesc *get_first_zpdesc(struct zspage *zspage) @@ mm/zsmalloc.c: static struct zpdesc *get_first_zpdesc(struct zspage *zspage) -static inline void set_first_obj_offset(struct page *page, unsigned int offset) +static inline void set_first_obj_offset(struct zpdesc *zpdesc, unsigned int offset) { -- /* With 24 bits available, we can support offsets into 16 MiB pages. */ -- BUILD_BUG_ON(PAGE_SIZE > SZ_16M); + /* With 24 bits available, we can support offsets into 16 MiB pages. */ + BUILD_BUG_ON(PAGE_SIZE > SZ_16M); - VM_WARN_ON_ONCE(!PageZsmalloc(page)); -+ /* With 16 bit available, we can support offsets into 64 KiB pages. */ -+ BUILD_BUG_ON(PAGE_SIZE > SZ_64K); + VM_WARN_ON_ONCE(!PageZsmalloc(zpdesc_page(zpdesc))); VM_WARN_ON_ONCE(offset & ~FIRST_OBJ_PAGE_TYPE_MASK); - page->page_type &= ~FIRST_OBJ_PAGE_TYPE_MASK; 18: ff7376e59bfd < -: ------------ mm/zsmalloc: introduce __zpdesc_clear_movable 19: 3d74d287da4e ! 18: 0ac98437b837 mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc() @@ Commit message __zpdesc_set_zsmalloc() for __SetPageZsmalloc(), and use them in callers. + [42.hyeyoo: keep reset_zpdesc() to use struct page] Signed-off-by: Alex Shi <alexs@xxxxxxxxxx> + Signed-off-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> ## mm/zpdesc.h ## -@@ mm/zpdesc.h: static inline void __zpdesc_clear_movable(struct zpdesc *zpdesc) - __ClearPageMovable(zpdesc_page(zpdesc)); +@@ mm/zpdesc.h: static inline void __zpdesc_set_movable(struct zpdesc *zpdesc, + __SetPageMovable(zpdesc_page(zpdesc), mops); } +static inline void __zpdesc_set_zsmalloc(struct zpdesc *zpdesc) @@ mm/zpdesc.h: static inline void __zpdesc_clear_movable(struct zpdesc *zpdesc) return PageIsolated(zpdesc_page(zpdesc)); ## mm/zsmalloc.c ## -@@ mm/zsmalloc.c: static void reset_zpdesc(struct zpdesc *zpdesc) - ClearPagePrivate(page); - zpdesc->zspage = NULL; - zpdesc->next = NULL; -- __ClearPageZsmalloc(page); -+ __zpdesc_clear_zsmalloc(zpdesc); - } - - static int trylock_zspage(struct zspage *zspage) @@ mm/zsmalloc.c: static struct zspage *alloc_zspage(struct zs_pool *pool, if (!zpdesc) { while (--i >= 0) { 20: 6e5528eb9957 < -: ------------ mm/zsmalloc: introduce zpdesc_clear_first() helper 21: d9346ccf3749 < -: ------------ mm/zsmalloc: update comments for page->zpdesc changes