On Tue, May 03, 2016 at 09:43:59AM +0900, Minchan Kim wrote: > Good morning, Chulmin > > On Tue, May 03, 2016 at 08:33:16AM +0900, Chulmin Kim wrote: > > Hello, Minchan! > > > > On 2016년 04월 27일 16:48, Minchan Kim wrote: > > >This patch introduces run-time migration feature for zspage. > > > > > >For migration, VM uses page.lru field so it would be better to not use > > >page.next field for own purpose. For that, firstly, we can get first > > >object offset of the page via runtime calculation instead of > > >page->index so we can use page->index as link for page chaining. > > >In case of huge object, it stores handle rather than page chaining. > > >To identify huge object, we uses PG_owner_priv_1 flag. > > > > > >For migration, it supports three functions > > > > > >* zs_page_isolate > > > > > >It isolates a zspage which includes a subpage VM want to migrate from > > >class so anyone cannot allocate new object from the zspage if it's first > > >isolation on subpages of zspage. Thus, further isolation on other > > >subpages cannot isolate zspage from class list. > > > > > >* zs_page_migrate > > > > > >First of all, it holds write-side zspage->lock to prevent migrate other > > >subpage in zspage. Then, lock all objects in the page VM want to migrate. > > >The reason we should lock all objects in the page is due to race between > > >zs_map_object and zs_page_migrate. > > > > > >zs_map_object zs_page_migrate > > > > > >pin_tag(handle) > > >obj = handle_to_obj(handle) > > >obj_to_location(obj, &page, &obj_idx); > > > > > > write_lock(&zspage->lock) > > > if (!trypin_tag(handle)) > > > goto unpin_object > > > > > >zspage = get_zspage(page); > > >read_lock(&zspage->lock); > > > > > >If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can > > >be stale so go crash. > > > > > >If it locks all of objects successfully, it copies content from old page > > >create new one, finally, create new page chain with new page. > > >If it's last isolated page in the zspage, put the zspage back to class. > > > > > >* zs_page_putback > > > > > >It returns isolated zspage to right fullness_group list if it fails to > > >migrate a page. > > > > > >Lastly, this patch introduces asynchronous zspage free. The reason > > >we need it is we need page_lock to clear PG_movable but unfortunately, > > >zs_free path should be atomic so the apporach is try to grab page_lock > > >with preemption disabled. If it got page_lock of all of pages > > >successfully, it can free zspage in the context. Otherwise, it queues > > >the free request and free zspage via workqueue in process context. > > > > > >Cc: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx> > > >Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > > >--- > > > include/uapi/linux/magic.h | 1 + > > > mm/zsmalloc.c | 552 +++++++++++++++++++++++++++++++++++++++------ > > > 2 files changed, 487 insertions(+), 66 deletions(-) > > > > > >diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > >index e1fbe72c39c0..93b1affe4801 100644 > > >--- a/include/uapi/linux/magic.h > > >+++ b/include/uapi/linux/magic.h > > >@@ -79,5 +79,6 @@ > > > #define NSFS_MAGIC 0x6e736673 > > > #define BPF_FS_MAGIC 0xcafe4a11 > > > #define BALLOON_KVM_MAGIC 0x13661366 > > >+#define ZSMALLOC_MAGIC 0x58295829 > > > > > > #endif /* __LINUX_MAGIC_H__ */ > > >diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > >index 8d82e44c4644..042793015ecf 100644 > > >--- a/mm/zsmalloc.c > > >+++ b/mm/zsmalloc.c > > >@@ -17,15 +17,14 @@ > > > * > > > * Usage of struct page fields: > > > * page->private: points to zspage > > >- * page->index: offset of the first object starting in this page. > > >- * For the first page, this is always 0, so we use this field > > >- * to store handle for huge object. > > >- * page->next: links together all component pages of a zspage > > >+ * page->freelist: links together all component pages of a zspage > > >+ * For the huge page, this is always 0, so we use this field > > >+ * to store handle. > > > * > > > * Usage of struct page flags: > > > * PG_private: identifies the first component page > > > * PG_private2: identifies the last component page > > >- * > > >+ * PG_owner_priv_1: indentifies the huge component page > > > */ > > > > > > #include <linux/module.h> > > >@@ -47,6 +46,10 @@ > > > #include <linux/debugfs.h> > > > #include <linux/zsmalloc.h> > > > #include <linux/zpool.h> > > >+#include <linux/mount.h> > > >+#include <linux/migrate.h> > > >+ > > >+#define ZSPAGE_MAGIC 0x58 > > > > > > /* > > > * This must be power of 2 and greater than of equal to sizeof(link_free). > > >@@ -128,8 +131,33 @@ > > > * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN > > > * (reason above) > > > */ > > >+ > > >+/* > > >+ * A zspage's class index and fullness group > > >+ * are encoded in its (first)page->mapping > > >+ */ > > >+#define FULLNESS_BITS 2 > > >+#define CLASS_BITS 8 > > >+#define ISOLATED_BITS 3 > > >+#define MAGIC_VAL_BITS 8 > > >+ > > >+ > > > #define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS) > > > > > >+struct zspage { > > >+ struct { > > >+ unsigned int fullness:FULLNESS_BITS; > > >+ unsigned int class:CLASS_BITS; > > >+ unsigned int isolated:ISOLATED_BITS; > > >+ unsigned int magic:MAGIC_VAL_BITS; > > >+ }; > > >+ unsigned int inuse; > > >+ unsigned int freeobj; > > >+ struct page *first_page; > > >+ struct list_head list; /* fullness list */ > > >+ rwlock_t lock; > > >+}; > > >+ > > > /* > > > * We do not maintain any list for completely empty or full pages > > > */ > > >@@ -161,6 +189,8 @@ struct zs_size_stat { > > > static struct dentry *zs_stat_root; > > > #endif > > > > > >+static struct vfsmount *zsmalloc_mnt; > > >+ > > > /* > > > * number of size_classes > > > */ > > >@@ -243,24 +273,10 @@ struct zs_pool { > > > #ifdef CONFIG_ZSMALLOC_STAT > > > struct dentry *stat_dentry; > > > #endif > > >-}; > > >- > > >-/* > > >- * A zspage's class index and fullness group > > >- * are encoded in its (first)page->mapping > > >- */ > > >-#define FULLNESS_BITS 2 > > >-#define CLASS_BITS 8 > > >- > > >-struct zspage { > > >- struct { > > >- unsigned int fullness:FULLNESS_BITS; > > >- unsigned int class:CLASS_BITS; > > >- }; > > >- unsigned int inuse; > > >- unsigned int freeobj; > > >- struct page *first_page; > > >- struct list_head list; /* fullness list */ > > >+ struct inode *inode; > > >+ spinlock_t free_lock; > > >+ struct work_struct free_work; > > >+ struct list_head free_zspage; > > > }; > > > > > > struct mapping_area { > > >@@ -312,8 +328,11 @@ static struct zspage *cache_alloc_zspage(struct zs_pool *pool, gfp_t flags) > > > struct zspage *zspage; > > > > > > zspage = kmem_cache_alloc(pool->zspage_cachep, flags & ~__GFP_HIGHMEM); > > >- if (zspage) > > >+ if (zspage) { > > > memset(zspage, 0, sizeof(struct zspage)); > > >+ zspage->magic = ZSPAGE_MAGIC; > > >+ rwlock_init(&zspage->lock); > > > > + INIT_LIST_HEAD(&zspage->list); > > > > If there is no special intention here, > > I think we need the list initialization. > > Intention was that I just watned to add unncessary instruction there I just don't want to add unnecessary instruction there Typo. :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href