The patch titled Subject: Re: [PATCH] mm: kill frontswap has been added to the -mm mm-unstable branch. Its filename is mm-kill-frontswap.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-kill-frontswap.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Johannes Weiner <hannes@xxxxxxxxxxx> Subject: Re: [PATCH] mm: kill frontswap Date: Mon, 17 Jul 2023 12:02:27 -0400 The only user of frontswap is zswap, and has been for a long time. Have swap call into zswap directly and remove the indirection. Link: https://lkml.kernel.org/r/20230717160227.GA867137@xxxxxxxxxxx Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> Acked-by: Nhat Pham <nphamcs@xxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Domenico Cerasuolo <cerasuolodomenico@xxxxxxxxx> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- Documentation/admin-guide/mm/zswap.rst | 14 Documentation/mm/frontswap.rst | 264 ----------- Documentation/mm/index.rst | 1 Documentation/translations/zh_CN/mm/frontswap.rst | 196 -------- Documentation/translations/zh_CN/mm/index.rst | 1 MAINTAINERS | 7 fs/proc/meminfo.c | 1 include/linux/frontswap.h | 91 --- include/linux/swap.h | 9 include/linux/swapfile.h | 5 include/linux/zswap.h | 37 + mm/Kconfig | 4 mm/Makefile | 1 mm/frontswap.c | 283 ------------ mm/page_io.c | 6 mm/swapfile.c | 33 - mm/zswap.c | 160 ++---- 17 files changed, 122 insertions(+), 991 deletions(-) --- a/Documentation/admin-guide/mm/zswap.rst~mm-kill-frontswap +++ a/Documentation/admin-guide/mm/zswap.rst @@ -49,7 +49,7 @@ compressed pool. Design ====== -Zswap receives pages for compression through the Frontswap API and is able to +Zswap receives pages for compression from the swap subsystem and is able to evict pages from its own compressed pool on an LRU basis and write them back to the backing swap device in the case that the compressed pool is full. @@ -70,19 +70,19 @@ means the compression ratio will always zbud pages). The zsmalloc type zpool has a more complex compressed page storage method, and it can achieve greater storage densities. -When a swap page is passed from frontswap to zswap, zswap maintains a mapping +When a swap page is passed from swapout to zswap, zswap maintains a mapping of the swap entry, a combination of the swap type and swap offset, to the zpool handle that references that compressed swap page. This mapping is achieved with a red-black tree per swap type. The swap offset is the search key for the tree nodes. -During a page fault on a PTE that is a swap entry, frontswap calls the zswap -load function to decompress the page into the page allocated by the page fault -handler. +During a page fault on a PTE that is a swap entry, the swapin code calls the +zswap load function to decompress the page into the page allocated by the page +fault handler. Once there are no PTEs referencing a swap page stored in zswap (i.e. the count -in the swap_map goes to 0) the swap code calls the zswap invalidate function, -via frontswap, to free the compressed entry. +in the swap_map goes to 0) the swap code calls the zswap invalidate function +to free the compressed entry. Zswap seeks to be simple in its policies. Sysfs attributes allow for one user controlled policy: --- a/Documentation/mm/frontswap.rst +++ /dev/null @@ -1,264 +0,0 @@ -========= -Frontswap -========= - -Frontswap provides a "transcendent memory" interface for swap pages. -In some environments, dramatic performance savings may be obtained because -swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswap is so named because it can be thought of as the opposite of -a "backing" store for a swap device. The storage is assumed to be -a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming -to the requirements of transcendent memory (such as Xen's "tmem", or -in-kernel compressed memory, aka "zcache", or future RAM-like devices); -this pseudo-RAM device is not directly accessible or addressable by the -kernel and is of unknown and possibly time-varying size. The driver -links itself to frontswap by calling frontswap_register_ops to set the -frontswap_ops funcs appropriately and the functions it provides must -conform to certain policies as follows: - -An "init" prepares the device to receive frontswap pages associated -with the specified swap device number (aka "type"). A "store" will -copy the page to transcendent memory and associate it with the type and -offset associated with the page. A "load" will copy the page, if found, -from transcendent memory into kernel memory, but will NOT remove the page -from transcendent memory. An "invalidate_page" will remove the page -from transcendent memory and an "invalidate_area" will remove ALL pages -associated with the swap type (e.g., like swapoff) and notify the "device" -to refuse further stores with that swap type. - -Once a page is successfully stored, a matching load on the page will normally -succeed. So when the kernel finds itself in a situation where it needs -to swap out a page, it first attempts to use frontswap. If the store returns -success, the data has been successfully saved to transcendent memory and -a disk write and, if the data is later read back, a disk read are avoided. -If a store returns failure, transcendent memory has rejected the data, and the -page can be written to swap as usual. - -Note that if a page is stored and the page already exists in transcendent memory -(a "duplicate" store), either the store succeeds and the data is overwritten, -or the store fails AND the page is invalidated. This ensures stale data may -never be obtained from frontswap. - -If properly configured, monitoring of frontswap is done via debugfs in -the `/sys/kernel/debug/frontswap` directory. The effectiveness of -frontswap can be measured (across all swap devices) with: - -``failed_stores`` - how many store attempts have failed - -``loads`` - how many loads were attempted (all should succeed) - -``succ_stores`` - how many store attempts have succeeded - -``invalidates`` - how many invalidates were attempted - -A backend implementation may provide additional metrics. - -FAQ -=== - -* Where's the value? - -When a workload starts swapping, performance falls through the floor. -Frontswap significantly increases performance in many such workloads by -providing a clean, dynamic interface to read and write swap pages to -"transcendent memory" that is otherwise not directly addressable to the kernel. -This interface is ideal when data is transformed to a different form -and size (such as with compression) or secretly moved (as might be -useful for write-balancing for some RAM-like devices). Swap pages (and -evicted page-cache pages) are a great use for this kind of slower-than-RAM- -but-much-faster-than-disk "pseudo-RAM device". - -Frontswap with a fairly small impact on the kernel, -provides a huge amount of flexibility for more dynamic, flexible RAM -utilization in various system configurations: - -In the single kernel case, aka "zcache", pages are compressed and -stored in local memory, thus increasing the total anonymous pages -that can be safely kept in RAM. Zcache essentially trades off CPU -cycles used in compression/decompression for better memory utilization. -Benchmarks have shown little or no impact when memory pressure is -low while providing a significant performance improvement (25%+) -on some workloads under high memory pressure. - -"RAMster" builds on zcache by adding "peer-to-peer" transcendent memory -support for clustered systems. Frontswap pages are locally compressed -as in zcache, but then "remotified" to another system's RAM. This -allows RAM to be dynamically load-balanced back-and-forth as needed, -i.e. when system A is overcommitted, it can swap to system B, and -vice versa. RAMster can also be configured as a memory server so -many servers in a cluster can swap, dynamically as needed, to a single -server configured with a large amount of RAM... without pre-configuring -how much of the RAM is available for each of the clients! - -In the virtual case, the whole point of virtualization is to statistically -multiplex physical resources across the varying demands of multiple -virtual machines. This is really hard to do with RAM and efforts to do -it well with no kernel changes have essentially failed (except in some -well-publicized special-case workloads). -Specifically, the Xen Transcendent Memory backend allows otherwise -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple -virtual machines, but the pages can be compressed and deduplicated to -optimize RAM utilization. And when guest OS's are induced to surrender -underutilized RAM (e.g. with "selfballooning"), sudden unexpected -memory pressure may result in swapping; frontswap allows those pages -to be swapped to and from hypervisor RAM (if overall host system memory -conditions allow), thus mitigating the potentially awful performance impact -of unplanned swapping. - -A KVM implementation is underway and has been RFC'ed to lkml. And, -using frontswap, investigation is also underway on the use of NVM as -a memory extension technology. - -* Sure there may be performance advantages in some situations, but - what's the space/time overhead of frontswap? - -If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into -nothingness and the only overhead is a few extra bytes per swapon'ed -swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" -registers, there is one extra global variable compared to zero for -every swap page read or written. If CONFIG_FRONTSWAP is enabled -AND a frontswap backend registers AND the backend fails every "store" -request (i.e. provides no memory despite claiming it might), -CPU overhead is still negligible -- and since every frontswap fail -precedes a swap page write-to-disk, the system is highly likely -to be I/O bound and using a small fraction of a percent of a CPU -will be irrelevant anyway. - -As for space, if CONFIG_FRONTSWAP is enabled AND a frontswap backend -registers, one bit is allocated for every swap page for every swap -device that is swapon'd. This is added to the EIGHT bits (which -was sixteen until about 2.6.34) that the kernel already allocates -for every swap page for every swap device that is swapon'd. (Hugh -Dickins has observed that frontswap could probably steal one of -the existing eight bits, but let's worry about that minor optimization -later.) For very large swap disks (which are rare) on a standard -4K pagesize, this is 1MB per 32GB swap. - -When swap pages are stored in transcendent memory instead of written -out to disk, there is a side effect that this may create more memory -pressure that can potentially outweigh the other advantages. A -backend, such as zcache, must implement policies to carefully (but -dynamically) manage memory limits to ensure this doesn't happen. - -* OK, how about a quick overview of what this frontswap patch does - in terms that a kernel hacker can grok? - -Let's assume that a frontswap "backend" has registered during -kernel initialization; this registration indicates that this -frontswap backend has access to some "memory" that is not directly -accessible by the kernel. Exactly how much memory it provides is -entirely dynamic and random. - -Whenever a swap-device is swapon'd frontswap_init() is called, -passing the swap device number (aka "type") as a parameter. -This notifies frontswap to expect attempts to "store" swap pages -associated with that number. - -Whenever the swap subsystem is readying a page to write to a swap -device (c.f swap_writepage()), frontswap_store is called. Frontswap -consults with the frontswap backend and if the backend says it does NOT -have room, frontswap_store returns -1 and the kernel swaps the page -to the swap device as normal. Note that the response from the frontswap -backend is unpredictable to the kernel; it may choose to never accept a -page, it could accept every ninth page, or it might accept every -page. But if the backend does accept a page, the data from the page -has already been copied and associated with the type and offset, -and the backend guarantees the persistence of the data. In this case, -frontswap sets a bit in the "frontswap_map" for the swap device -corresponding to the page offset on the swap device to which it would -otherwise have written the data. - -When the swap subsystem needs to swap-in a page (swap_readpage()), -it first calls frontswap_load() which checks the frontswap_map to -see if the page was earlier accepted by the frontswap backend. If -it was, the page of data is filled from the frontswap backend and -the swap-in is complete. If not, the normal swap-in code is -executed to obtain the page of data from the real swap device. - -So every time the frontswap backend accepts a page, a swap device read -and (potentially) a swap device write are replaced by a "frontswap backend -store" and (possibly) a "frontswap backend loads", which are presumably much -faster. - -* Can't frontswap be configured as a "special" swap device that is - just higher priority than any real swap device (e.g. like zswap, - or maybe swap-over-nbd/NFS)? - -No. First, the existing swap subsystem doesn't allow for any kind of -swap hierarchy. Perhaps it could be rewritten to accommodate a hierarchy, -but this would require fairly drastic changes. Even if it were -rewritten, the existing swap subsystem uses the block I/O layer which -assumes a swap device is fixed size and any page in it is linearly -addressable. Frontswap barely touches the existing swap subsystem, -and works around the constraints of the block I/O subsystem to provide -a great deal of flexibility and dynamicity. - -For example, the acceptance of any swap page by the frontswap backend is -entirely unpredictable. This is critical to the definition of frontswap -backends because it grants completely dynamic discretion to the -backend. In zcache, one cannot know a priori how compressible a page is. -"Poorly" compressible pages can be rejected, and "poorly" can itself be -defined dynamically depending on current memory constraints. - -Further, frontswap is entirely synchronous whereas a real swap -device is, by definition, asynchronous and uses block I/O. The -block I/O layer is not only unnecessary, but may perform "optimizations" -that are inappropriate for a RAM-oriented device including delaying -the write of some pages for a significant amount of time. Synchrony is -required to ensure the dynamicity of the backend and to avoid thorny race -conditions that would unnecessarily and greatly complicate frontswap -and/or the block I/O subsystem. That said, only the initial "store" -and "load" operations need be synchronous. A separate asynchronous thread -is free to manipulate the pages stored by frontswap. For example, -the "remotification" thread in RAMster uses standard asynchronous -kernel sockets to move compressed frontswap pages to a remote machine. -Similarly, a KVM guest-side implementation could do in-guest compression -and use "batched" hypercalls. - -In a virtualized environment, the dynamicity allows the hypervisor -(or host OS) to do "intelligent overcommit". For example, it can -choose to accept pages only until host-swapping might be imminent, -then force guests to do their own swapping. - -There is a downside to the transcendent memory specifications for -frontswap: Since any "store" might fail, there must always be a real -slot on a real swap device to swap the page. Thus frontswap must be -implemented as a "shadow" to every swapon'd device with the potential -capability of holding every page that the swap device might have held -and the possibility that it might hold no pages at all. This means -that frontswap cannot contain more pages than the total of swapon'd -swap devices. For example, if NO swap device is configured on some -installation, frontswap is useless. Swapless portable devices -can still use frontswap but a backend for such devices must configure -some kind of "ghost" swap device and ensure that it is never used. - -* Why this weird definition about "duplicate stores"? If a page - has been previously successfully stored, can't it always be - successfully overwritten? - -Nearly always it can, but no, sometimes it cannot. Consider an example -where data is compressed and the original 4K page has been compressed -to 1K. Now an attempt is made to overwrite the page with data that -is non-compressible and so would take the entire 4K. But the backend -has no more space. In this case, the store must be rejected. Whenever -frontswap rejects a store that would overwrite, it also must invalidate -the old data and ensure that it is no longer accessible. Since the -swap subsystem then writes the new data to the read swap device, -this is the correct course of action to ensure coherency. - -* Why does the frontswap patch create the new include file swapfile.h? - -The frontswap code depends on some swap-subsystem-internal data -structures that have, over the years, moved back and forth between -static and global. This seemed a reasonable compromise: Define -them as global but declare them in a new include file that isn't -included by the large number of source files that include swap.h. - -Dan Magenheimer, last updated April 9, 2012 --- a/Documentation/mm/index.rst~mm-kill-frontswap +++ a/Documentation/mm/index.rst @@ -44,7 +44,6 @@ above structured documentation, or delet balance damon/index free_page_reporting - frontswap hmm hwpoison hugetlbfs_reserv --- a/Documentation/translations/zh_CN/mm/frontswap.rst +++ /dev/null @@ -1,196 +0,0 @@ -:Original: Documentation/mm/frontswap.rst - -:ç¿»è¯?: - - å?¸å»¶è?¾ Yanteng Si <siyanteng@xxxxxxxxxxx> - -:æ ¡è¯?: - -========= -Frontswap -========= - -Frontswap为交æ?¢é¡µæ??ä¾?äº?ä¸?个 â??transcendent memoryâ?? ç??æ?¥å?£ã??å?¨ä¸?äº?ç?¯å¢?ä¸ï¼?ç?± -äº?交æ?¢é¡µè¢«ä¿?å?å?¨RAMï¼?æ??类似RAMç??设å¤?ï¼?ä¸ï¼?è??ä¸?æ?¯äº¤æ?¢ç£?ç??ï¼?å? æ¤å?¯ä»¥è?·å¾?巨大ç??æ?§è?½ -è??ç??ï¼?æ??é«?ï¼?ã?? - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswapä¹?æ??以è¿?ä¹?å?½å??ï¼?æ?¯å? 为å®?å?¯ä»¥è¢«è®¤ä¸ºæ?¯ä¸?swap设å¤?ç??â??backâ??å?å?¨ç?¸å??ã??å? -å?¨å?¨è¢«è®¤ä¸ºæ?¯ä¸?个å??æ¥å¹¶å??å®?å?¨ç??é?¢å??页é?¢ç??â??伪RAM设å¤?â??ï¼?符å??transcendent memory -ï¼?å¦?Xenç??â??tmemâ??ï¼?æ??å??æ ¸å??å??缩å??å?ï¼?å??称â??zcacheâ??ï¼?æ??æ?ªæ?¥ç??类似RAMç??设å¤?ï¼?ç??è¦? -æ±?ï¼?è¿?个伪RAM设å¤?ä¸?è?½è¢«å??æ ¸ç?´æ?¥è®¿é?®æ??寻å??ï¼?å?¶å¤§å°?æ?ªç?¥ä¸?å?¯è?½é??æ?¶é?´å??å??ã??驱å?¨ç¨?åº?é??è¿? -è°?ç?¨frontswap_register_opså°?è?ªå·±ä¸?frontswapé?¾æ?¥èµ·æ?¥ï¼?以é??å½?å?°è®¾ç½®frontswap_ops -ç??å??è?½ï¼?å®?æ??ä¾?ç??å??è?½å¿?须符å??æ??äº?ç?ç?¥ï¼?å¦?ä¸?æ??示: - -ä¸?个 â??initâ?? å°?设å¤?å??å¤?好æ?¥æ?¶ä¸?æ??å®?ç??交æ?¢è®¾å¤?ç¼?å?·ï¼?å??称â??ç±»å??â??ï¼?ç?¸å?³ç??frontswap -交æ?¢é¡µã??ä¸?个 â??storeâ?? å°?æ??该页å¤?å?¶å?°transcendent memoryï¼?并ä¸?该页ç??ç±»å??å??å??移 -é??ç?¸å?³è??ã??ä¸?个 â??loadâ?? å°?æ??该页ï¼?å¦?æ??æ?¾å?°ç??è¯?ï¼?ä»?transcendent memoryå¤?å?¶å?°å??æ ¸ -å??å?ï¼?ä½?ä¸?ä¼?ä»?transcendent memoryä¸å? é?¤è¯¥é¡µã??ä¸?个 â??invalidate_pageâ?? å°?ä»? -transcendent memoryä¸å? é?¤è¯¥é¡µï¼?ä¸?个 â??invalidate_areaâ?? å°?å? é?¤æ??æ??ä¸?交æ?¢ç±»å?? -ç?¸å?³ç??页ï¼?ä¾?å¦?ï¼?å??swapoffï¼?并é??ç?¥ â??deviceâ?? æ??ç»?è¿?ä¸?æ¥å?å?¨è¯¥äº¤æ?¢ç±»å??ã?? - -ä¸?æ?¦ä¸?个页é?¢è¢«æ??å??å?å?¨ï¼?å?¨è¯¥é¡µé?¢ä¸?ç??å?¹é??å? è½½é??常ä¼?æ??å??ã??å? æ¤ï¼?å½?å??æ ¸å??ç?°è?ªå·±å¤?äº?é?? -è¦?交æ?¢é¡µé?¢ç??æ??å?µæ?¶ï¼?å®?é¦?å??å°?è¯?使ç?¨frontswapã??å¦?æ??å?å?¨ç??ç»?æ??æ?¯æ??å??ç??ï¼?é?£ä¹?æ?°æ?®å°±å·² -ç»?æ??å??ç??ä¿?å?å?°äº?transcendent memoryä¸ï¼?并ä¸?é?¿å??äº?ç£?ç??å??å?¥ï¼?å¦?æ??å??æ?¥å??读å??æ?°æ?®ï¼? -ä¹?é?¿å??äº?ç£?ç??读å??ã??å¦?æ??å?å?¨è¿?å??失败ï¼?transcendent memoryå·²ç»?æ??ç»?äº?该æ?°æ?®ï¼?ä¸?该页 -å?¯ä»¥å??å¾?常ä¸?æ ·è¢«å??å?¥äº¤æ?¢ç©ºé?´ã?? - -请注æ??ï¼?å¦?æ??ä¸?个页é?¢è¢«å?å?¨ï¼?è??该页é?¢å·²ç»?å?å?¨äº?transcendent memoryä¸ï¼?ä¸?个 â??é??å¤?â?? -ç??å?å?¨ï¼?ï¼?è¦?ä¹?å?å?¨æ??å??ï¼?æ?°æ?®è¢«è¦?ç??ï¼?è¦?ä¹?å?å?¨å¤±è´¥ï¼?该页é?¢è¢«åº?æ¢ã??è¿?ç¡®ä¿?äº?æ?§ç??æ?°æ?®æ°¸è¿? -ä¸?ä¼?ä»?frontswapä¸è?·å¾?ã?? - -å¦?æ??é??ç½®æ£ç¡®ï¼?对frontswapç??ç??æ?§æ?¯é??è¿? `/sys/kernel/debug/frontswap` ç?®å½?ä¸?ç?? -debugfså®?æ??ç??ã??frontswapç??æ??æ??æ?§å?¯ä»¥é??è¿?以ä¸?æ?¹å¼?æµ?é??ï¼?å?¨æ??æ??交æ?¢è®¾å¤?ä¸ï¼?: - -``failed_stores`` - æ??å¤?å°?次å?å?¨ç??å°?è¯?æ?¯å¤±è´¥ç?? - -``loads`` - å°?è¯?äº?å¤?å°?次å? è½½ï¼?åº?该å?¨é?¨æ??å??ï¼? - -``succ_stores`` - æ??å¤?å°?次å?å?¨ç??å°?è¯?æ?¯æ??å??ç?? - -``invalidates`` - å°?è¯?äº?å¤?å°?次ä½?åº? - -å??å?°å®?ç?°å?¯ä»¥æ??ä¾?é¢?å¤?ç??æ??æ ?ã?? - -ç»?常é?®å?°ç??é?®é¢? -============== - -* ä»·å?¼å?¨å?ªé??? - -å½?ä¸?个工ä½?è´?è½½å¼?å§?交æ?¢æ?¶ï¼?æ?§è?½å°±ä¼?ä¸?é??ã??Frontswapé??è¿?æ??ä¾?ä¸?个干å??ç??ã??å?¨æ??ç??æ?¥å?£æ?¥ -读å??å??å??å?¥äº¤æ?¢é¡µå?° â??transcendent memoryâ??ï¼?ä»?è??大大å¢?å? äº?许å¤?è¿?æ ·ç??å·¥ä½?è´?è½½ç??æ?§ -è?½ï¼?å?¦å??å??æ ¸æ?¯æ? æ³?ç?´æ?¥å¯»å??ç??ã??å½?æ?°æ?®è¢«è½¬æ?¢ä¸ºä¸?å??ç??å½¢å¼?å??大å°?ï¼?æ¯?å¦?å??缩ï¼?æ??è??被ç§?å¯? -移å?¨ï¼?对äº?ä¸?äº?类似RAMç??设å¤?æ?¥è¯´ï¼?è¿?å?¯è?½å¯¹å??平衡å¾?æ??ç?¨ï¼?æ?¶ï¼?è¿?个æ?¥å?£æ?¯ç??æ?³ç??ã??交æ?¢ -页ï¼?å??被驱é??ç??页é?¢ç¼?å?页ï¼?æ?¯è¿?ç§?æ¯?RAMæ?¢ä½?æ¯?ç£?ç??å¿«å¾?å¤?ç??â??伪RAM设å¤?â??ç??ä¸?大ç?¨é??ã?? - -Frontswap对å??æ ¸ç??å½±å??ç?¸å½?å°?ï¼?为å??ç§?ç³»ç»?é??ç½®ä¸æ?´å?¨æ??ã??æ?´ç?µæ´»ç??RAMå?©ç?¨æ??ä¾?äº?巨大ç?? -ç?µæ´»æ?§ï¼? - -å?¨å??ä¸?å??æ ¸ç??æ??å?µä¸?ï¼?å??称â??zcacheâ??ï¼?页é?¢è¢«å??缩并å?å?¨å?¨æ?¬å?°å??å?ä¸ï¼?ä»?è??å¢?å? äº?å?¯ä»¥å®? -å?¨ä¿?å?å?¨RAMä¸ç??å?¿å??页é?¢æ?»æ?°ã??Zcacheæ?¬è´¨ä¸?æ?¯ç?¨å??缩/解å??缩ç??CPUå?¨æ??æ?¢å??æ?´å¥½ç??å??å?å?© -ç?¨ç??ã??Benchmarksæµ?è¯?æ?¾ç¤ºï¼?å½?å??å?å??å??è¾?ä½?æ?¶ï¼?å? ä¹?没æ??å½±å??ï¼?è??å?¨é«?å??å?å??å??ä¸?ç??ä¸?äº? -å·¥ä½?è´?è½½ä¸?ï¼?å??æ??æ??æ?¾ç??æ?§è?½æ?¹å??ï¼?25%以ä¸?ï¼?ã?? - -â??RAMsterâ?? å?¨zcacheç??å?ºç¡?ä¸?å¢?å? äº?对é??群系ç»?ç?? â??peer-to-peerâ?? transcendent memory -ç??æ?¯æ??ã??Frontswap页é?¢å??zcacheä¸?æ ·è¢«æ?¬å?°å??缩ï¼?ä½?é??å??被â??remotifiedâ?? å?°å?¦ä¸?个系 -ç»?ç??RAMã??è¿?使å¾?RAMå?¯ä»¥æ ¹æ?®é??è¦?å?¨æ??å?°æ?¥å??è´?载平衡ï¼?ä¹?å°±æ?¯è¯´ï¼?å½?ç³»ç»?Aè¶?è½½æ?¶ï¼?å®?å?¯ä»¥ -交æ?¢å?°ç³»ç»?Bï¼?å??ä¹?亦ç?¶ã??RAMsterä¹?å?¯ä»¥è¢«é??ç½®æ??ä¸?个å??å?æ??å?¡å?¨ï¼?å? æ¤é??群ä¸ç??许å¤?æ??å?¡å?¨ -å?¯ä»¥æ ¹æ?®é??è¦?å?¨æ??å?°äº¤æ?¢å?°é??ç½®æ??大é??å??å?ç??å??ä¸?æ??å?¡å?¨ä¸?......è??ä¸?é??è¦?é¢?å??é??ç½®æ¯?个客æ?· -æ??å¤?å°?å??å?å?¯ç?¨ - -å?¨è??æ??æ??å?µä¸?ï¼?è??æ??å??ç??å?¨é?¨æ??ä¹?å?¨äº?ç»?计å?°å°?ç?©ç??èµ?æº?å?¨å¤?个è??æ??æ?ºç??ä¸?å??é??æ±?ä¹?é?´è¿?è¡?å¤? -ç?¨ã??对äº?RAMæ?¥è¯´ï¼?è¿?ç??ç??å¾?é?¾å??å?°ï¼?è??ä¸?å?¨ä¸?æ?¹å??å??æ ¸ç??æ??å?µä¸?ï¼?è¦?å??好è¿?ä¸?ç?¹ç??å?ªå??å?ºæ?¬ä¸? -æ?¯å¤±è´¥ç??ï¼?é?¤äº?ä¸?äº?广为人ç?¥ç??ç?¹æ®?æ??å?µä¸?ç??å·¥ä½?è´?è½½ï¼?ã??å?·ä½?æ?¥è¯´ï¼?Xen Transcendent Memory -å??端å??许管ç??å?¨æ?¥æ??ç??RAM â??fallowâ??ï¼?ä¸?ä»?å?¯ä»¥å?¨å¤?个è??æ??æ?ºä¹?é?´è¿?è¡?â??time-sharedâ??ï¼? -è??ä¸?页é?¢å?¯ä»¥è¢«å??缩å??é??å¤?å?©ç?¨ï¼?以ä¼?å??RAMç??å?©ç?¨ç??ã??å½?客æ?·æ??ä½?ç³»ç»?被诱导交å?ºæ?ªå??å??å?©ç?¨ -ç??RAMæ?¶ï¼?å¦? â??selfballooningâ??ï¼?ï¼?çª?ç?¶å?ºç?°ç??æ??å¤?å??å?å??å??å?¯è?½ä¼?导è?´äº¤æ?¢ï¼?frontswap -å??许è¿?äº?页é?¢è¢«äº¤æ?¢å?°ç®¡ç??å?¨RAMä¸æ??ä»?管ç??å?¨RAMä¸äº¤æ?¢ï¼?å¦?æ??æ?´ä½?主æ?ºç³»ç»?å??å?æ?¡ä»¶å??许ï¼?ï¼? -ä»?è??å??轻计å??å¤?交æ?¢å?¯è?½å¸¦æ?¥ç??å?¯æ??ç??æ?§è?½å½±å??ã?? - -ä¸?个KVMç??å®?ç?°æ£å?¨è¿?è¡?ä¸ï¼?并ä¸?å·²ç»?被RFC'edå?°lkmlã??è??ä¸?ï¼?å?©ç?¨frontswapï¼?对NVMä½?为 -å??å?æ?©å±?æ??æ?¯ç??è°?æ?¥ä¹?å?¨è¿?è¡?ä¸ã?? - -* å½?ç?¶ï¼?å?¨æ??äº?æ??å?µä¸?å?¯è?½æ??æ?§è?½ä¸?ç??ä¼?å?¿ï¼?ä½?frontswapç??空é?´/æ?¶é?´å¼?é??æ?¯å¤?å°?ï¼? - -å¦?æ?? CONFIG_FRONTSWAP 被ç¦?ç?¨ï¼?æ¯?个 frontswap é?©å?é?½ä¼?ç¼?è¯?æ??空ï¼?å?¯ä¸?ç??å¼?é??æ?¯æ¯? -个 swapon'ed swap 设å¤?ç??å? 个é¢?å¤?å?è??ã??å¦?æ?? CONFIG_FRONTSWAP 被å?¯ç?¨ï¼?ä½?没æ?? -frontswapç?? â??backendâ?? å¯?å?å?¨ï¼?æ¯?读æ??å??ä¸?个交æ?¢é¡µå°±ä¼?æ??ä¸?个é¢?å¤?ç??å?¨å±?å??é??ï¼?è??ä¸? -æ?¯é?¶ã??å¦?æ?? CONFIG_FRONTSWAP 被å?¯ç?¨ï¼?并ä¸?æ??ä¸?个frontswapç??backendå¯?å?å?¨ï¼?并ä¸? -å??端æ¯?次 â??storeâ?? 请æ±?é?½å¤±è´¥ï¼?å?³å°½ç®¡å£°ç§°å?¯è?½ï¼?ä½?没æ??æ??ä¾?å??å?ï¼?ï¼?CPU ç??å¼?é??ä»?ç?¶å?¯ä»¥ -忽ç?¥ä¸?计 - å? 为æ¯?次frontswap失败é?½æ?¯å?¨äº¤æ?¢é¡µå??å?°ç£?ç??ä¹?å??ï¼?ç³»ç»?å¾?å?¯è?½æ?¯ I/O ç»?å®? -ç??ï¼?æ? 论å¦?ä½?使ç?¨ä¸?å°?é?¨å??ç?? CPU é?½æ?¯ä¸?ç?¸å?³ç??ã?? - -è?³äº?空é?´ï¼?å¦?æ??CONFIG_FRONTSWAP被å?¯ç?¨ï¼?并ä¸?æ??ä¸?个frontswapç??backend注å??ï¼?é?£ä¹? -æ¯?个交æ?¢è®¾å¤?ç??æ¯?个交æ?¢é¡µé?½ä¼?被å??é??ä¸?个æ¯?ç?¹ã??è¿?æ?¯å?¨å??æ ¸å·²ç»?为æ¯?个交æ?¢è®¾å¤?ç??æ¯?个交æ?¢ -页å??é??ç??8ä½?ï¼?å?¨2.6.34ä¹?å??æ?¯16ä½?ï¼?ä¸?å¢?å? ç??ã??(Hugh Dickinsè§?å¯?å?°ï¼?frontswapå?¯è?½ -ä¼?å?·å??ç?°æ??ç??8个æ¯?ç?¹ï¼?ä½?æ?¯æ??们以å??å??æ?¥æ??å¿?è¿?个å°?ç??ä¼?å??é?®é¢?)ã??对äº?æ ?å??ç??4K页é?¢å¤§å°?ç?? -é??常大ç??交æ?¢ç??ï¼?è¿?å¾?ç½?è§?ï¼?ï¼?è¿?æ?¯æ¯?32GB交æ?¢ç??1MBå¼?é??ã?? - -å½?交æ?¢é¡µå?å?¨å?¨transcendent memoryä¸è??ä¸?æ?¯å??å?°ç£?ç??ä¸?æ?¶ï¼?æ??ä¸?个å?¯ä½?ç?¨ï¼?å?³è¿?å?¯è?½ä¼? -产ç??æ?´å¤?ç??å??å?å??å??ï¼?æ??å?¯è?½è¶?è¿?å?¶ä»?ç??ä¼?ç?¹ã??ä¸?个backendï¼?æ¯?å¦?zcacheï¼?å¿?é¡»å®?ç?°ç?ç?¥ -æ?¥ä»?ç»?ï¼?ä½?å?¨æ??å?°ï¼?管ç??å??å?é??å?¶ï¼?以确ä¿?è¿?ç§?æ??å?µä¸?ä¼?å??ç??ã?? - -* 好å?§ï¼?é?£å°±ç?¨å??æ ¸éª?客è?½ç??解ç??æ?¯è¯æ?¥å¿«é??æ¦?è¿°ä¸?ä¸?è¿?个frontswapè¡¥ä¸?ç??ä½?ç?¨å¦?ä½?ï¼? - -æ??们å??设å?¨å??æ ¸å??å§?å??è¿?ç¨?ä¸ï¼?ä¸?个frontswap ç?? â??backendâ?? å·²ç»?注å??äº?ï¼?è¿?个注å??表 -æ??è¿?个frontswap ç?? â??backendâ?? å?¯ä»¥è®¿é?®ä¸?äº?ä¸?被å??æ ¸ç?´æ?¥è®¿é?®ç??â??å??å?â??ã??å®?å?°åº?æ?? -ä¾?äº?å¤?å°?å??å?æ?¯å®?å?¨å?¨æ??å??é??æ?ºç??ã?? - -æ¯?å½?ä¸?个交æ?¢è®¾å¤?被交æ?¢æ?¶ï¼?å°±ä¼?è°?ç?¨frontswap_init()ï¼?æ??交æ?¢è®¾å¤?ç??ç¼?å?·ï¼?å??称â??ç±» -å??â??ï¼?ä½?为ä¸?个å??æ?°ä¼ ç»?å®?ã??è¿?å°±é??ç?¥äº?frontswapï¼?以æ??å¾? â??storeâ?? ä¸?该å?·ç ?ç?¸å?³ç??交 -æ?¢é¡µç??å°?è¯?ã?? - -æ¯?å½?交æ?¢å?ç³»ç»?å??å¤?å°?ä¸?个页é?¢å??å?¥äº¤æ?¢è®¾å¤?æ?¶ï¼?å??è§?swap_writepage()ï¼?ï¼?å°±ä¼?è°?ç?¨ -frontswap_storeã??Frontswapä¸?frontswap backendå??å??ï¼?å¦?æ??backend说å®?没æ??空 -é?´ï¼?frontswap_storeè¿?å??-1ï¼?å??æ ¸å°±ä¼?ç?§å¸¸æ??页æ?¢å?°äº¤æ?¢è®¾å¤?ä¸?ã??注æ??ï¼?æ?¥è?ªfrontswap -backendç??å??åº?对å??æ ¸æ?¥è¯´æ?¯ä¸?å?¯é¢?æµ?ç??ï¼?å®?å?¯è?½é??æ?©ä»?ä¸?æ?¥å??ä¸?个页é?¢ï¼?å?¯è?½æ?¥å??æ¯?ä¹?个 -页é?¢ï¼?ä¹?å?¯è?½æ?¥å??æ¯?ä¸?个页é?¢ã??ä½?æ?¯å¦?æ??backendç¡®å®?æ?¥å??äº?ä¸?个页é?¢ï¼?é?£ä¹?è¿?个页é?¢ç??æ?° -æ?®å·²ç»?被å¤?å?¶å¹¶ä¸?ç±»å??å??å??移é??ç?¸å?³è??äº?ï¼?è??ä¸?backendä¿?è¯?äº?æ?°æ?®ç??æ??ä¹?æ?§ã??å?¨è¿?ç§?æ??å?µ -ä¸?ï¼?frontswapå?¨äº¤æ?¢è®¾å¤?ç??â??frontswap_mapâ?? ä¸è®¾ç½®äº?ä¸?个ä½?ï¼?对åº?äº?交æ?¢è®¾å¤?ä¸?ç?? -页é?¢å??移é??ï¼?å?¦å??å®?å°±ä¼?å°?æ?°æ?®å??å?¥è¯¥è®¾å¤?ã?? - -å½?交æ?¢å?ç³»ç»?é??è¦?交æ?¢ä¸?个页é?¢æ?¶ï¼?swap_readpage()ï¼?ï¼?å®?é¦?å??è°?ç?¨frontswap_load()ï¼? -æ£?æ?¥frontswap_mapï¼?ç??è¿?个页é?¢æ?¯å?¦æ?©å??被frontswap backendæ?¥å??ã??å¦?æ??æ?¯ï¼?该页 -ç??æ?°æ?®å°±ä¼?ä»?frontswapå??端填å??ï¼?æ?¢å?¥å°±å®?æ??äº?ã??å¦?æ??ä¸?æ?¯ï¼?æ£å¸¸ç??交æ?¢ä»£ç ?å°?被æ?§è¡?ï¼? -以便ä»?ç??æ£ç??交æ?¢è®¾å¤?ä¸?è?·å¾?è¿?ä¸?页ç??æ?°æ?®ã?? - -æ??以æ¯?次frontswap backendæ?¥å??ä¸?个页é?¢æ?¶ï¼?交æ?¢è®¾å¤?ç??读å??å??ï¼?å?¯è?½ï¼?交æ?¢è®¾å¤?ç??å?? -å?¥é?½è¢« â??frontswap backend storeâ?? å??ï¼?å?¯è?½ï¼?â??frontswap backend loadsâ?? -æ??å??代ï¼?è¿?å?¯è?½ä¼?å¿«å¾?å¤?ã?? - -* frontswapä¸?è?½è¢«é??置为ä¸?个 â??ç?¹æ®?ç??â?? 交æ?¢è®¾å¤?ï¼?å®?ç??ä¼?å??级è¦?é«?äº?ä»»ä½?ç??æ£ç??交æ?¢ - 设å¤?ï¼?ä¾?å¦?å??zswapï¼?æ??è??å?¯è?½æ?¯swap-over-nbd/NFSï¼?ï¼? - -é¦?å??ï¼?ç?°æ??ç??交æ?¢å?ç³»ç»?ä¸?å??许æ??ä»»ä½?ç§?ç±»ç??交æ?¢å±?次ç»?æ??ã??ä¹?许å®?å?¯ä»¥è¢«é??å??以é??åº?å±?次 -ç»?æ??ï¼?ä½?è¿?å°?é??è¦?ç?¸å½?大ç??æ?¹å??ã??å?³ä½¿å®?被é??å??ï¼?ç?°æ??ç??交æ?¢å?ç³»ç»?ä¹?使ç?¨äº?å??I/Oå±?ï¼?å®? -å??å®?交æ?¢è®¾å¤?æ?¯å?ºå®?大å°?ç??ï¼?å?¶ä¸ç??ä»»ä½?页é?¢é?½æ?¯å?¯çº¿æ?§å¯»å??ç??ã??Frontswapå? ä¹?没æ??触 -å??ç?°æ??ç??交æ?¢å?ç³»ç»?ï¼?è??æ?¯å?´ç»?ç??å??I/Oå?ç³»ç»?ç??é??å?¶ï¼?æ??ä¾?äº?大é??ç??ç?µæ´»æ?§å??å?¨æ??æ?§ã?? - -ä¾?å¦?ï¼?frontswap backend对任ä½?交æ?¢é¡µç??æ?¥å??æ?¯å®?å?¨ä¸?å?¯é¢?æµ?ç??ã??è¿?对frontswap backend -ç??å®?ä¹?è?³å?³é??è¦?ï¼?å? 为å®?èµ?äº?äº?backendå®?å?¨å?¨æ??ç??å?³å®?æ??ã??å?¨zcacheä¸ï¼?人们æ? æ³?é¢? -å??ç?¥é??ä¸?个页é?¢ç??å?¯å??缩æ?§å¦?ä½?ã??å?¯å??缩æ?§ â??å·®â?? ç??页é?¢ä¼?被æ??ç»?ï¼?è?? â??å·®â?? æ?¬èº«ä¹?å?¯ -ä»¥æ ¹æ?®å½?å??ç??å??å?é??å?¶å?¨æ??å?°å®?ä¹?ã?? - -æ¤å¤?ï¼?frontswapæ?¯å®?å?¨å??æ¥ç??ï¼?è??ç??æ£ç??交æ?¢è®¾å¤?ï¼?æ ¹æ?®å®?ä¹?ï¼?æ?¯å¼?æ¥ç??ï¼?并ä¸?使ç?¨ -å??I/Oã??å??I/Oå±?ä¸?ä»?æ?¯ä¸?å¿?è¦?ç??ï¼?è??ä¸?å?¯è?½è¿?è¡? â??ä¼?å??â??ï¼?è¿?对é?¢å??RAMç??设å¤?æ?¥è¯´æ?¯ -ä¸?å??é??ç??ï¼?å??æ?¬å°?ä¸?äº?页é?¢ç??å??å?¥å»¶è¿?ç?¸å½?é?¿ç??æ?¶é?´ã??å??æ¥æ?¯å¿?é¡»ç??ï¼?以确ä¿?å??端ç??å?¨ -æ??æ?§ï¼?并é?¿å??æ£?æ??ç??ç«?äº?æ?¡ä»¶ï¼?è¿?å°?ä¸?å¿?è¦?å?°å¤§å¤§å¢?å? frontswapå??/æ??å??I/Oå?ç³»ç»?ç?? -å¤?æ??æ?§ã??ä¹?å°±æ?¯è¯´ï¼?å?ªæ??æ??å??ç?? â??storeâ?? å?? â??loadâ?? æ??ä½?æ?¯é??è¦?å??æ¥ç??ã??ä¸?个ç?¬ç«? -ç??å¼?æ¥çº¿ç¨?å?¯ä»¥è?ªç?±å?°æ??ä½?ç?±frontswapå?å?¨ç??页é?¢ã??ä¾?å¦?ï¼?RAMsterä¸ç?? â??remotificationâ?? -线ç¨?使ç?¨æ ?å??ç??å¼?æ¥å??æ ¸å¥?æ?¥å?ï¼?å°?å??缩ç??frontswap页é?¢ç§»å?¨å?°è¿?ç¨?æ?ºå?¨ã??å??æ ·ï¼? -KVMç??客æ?·æ?¹å®?ç?°å?¯ä»¥è¿?è¡?客æ?·å??å??缩ï¼?并使ç?¨ â??batchedâ?? hypercallsã?? - -å?¨è??æ??å??ç?¯å¢?ä¸ï¼?å?¨æ??æ?§å??许管ç??ç¨?åº?ï¼?æ??主æ?ºæ??ä½?ç³»ç»?ï¼?å??â??intelligent overcommitâ??ã?? -ä¾?å¦?ï¼?å®?å?¯ä»¥é??æ?©å?ªæ?¥å??页é?¢ï¼?ç?´å?°ä¸»æ?ºäº¤æ?¢å?¯è?½å?³å°?å??ç??ï¼?ç?¶å??强迫客æ?·æ?ºå??ä»?们 -è?ªå·±ç??交æ?¢ã?? - -transcendent memoryè§?æ ¼ç??frontswapæ??ä¸?个å??å¤?ã??å? 为任ä½? â??storeâ?? é?½å?¯ -è?½å¤±è´¥ï¼?æ??以å¿?é¡»å?¨ä¸?个ç??æ£ç??交æ?¢è®¾å¤?ä¸?æ??ä¸?个ç??æ£ç??æ??槽æ?¥äº¤æ?¢é¡µé?¢ã??å? æ¤ï¼? -frontswapå¿?é¡»ä½?为æ¯?个交æ?¢è®¾å¤?ç?? â??å½±å?â?? æ?¥å®?ç?°ï¼?å®?æ??å?¯è?½å®¹çº³äº¤æ?¢è®¾å¤?å?¯è?½ -容纳ç??æ¯?ä¸?个页é?¢ï¼?ä¹?æ??å?¯è?½æ ¹æ?¬ä¸?容纳任ä½?页é?¢ã??è¿?æ??å?³ç??frontswapä¸?è?½å??å?«æ¯? -swap设å¤?æ?»æ?°æ?´å¤?ç??页é?¢ã??ä¾?å¦?ï¼?å¦?æ??å?¨æ??äº?å®?è£?ä¸?没æ??é??置交æ?¢è®¾å¤?ï¼?frontswap -就没æ??ç?¨ã??æ? 交æ?¢è®¾å¤?ç??便æ?ºå¼?设å¤?ä»?ç?¶å?¯ä»¥ä½¿ç?¨frontswapï¼?ä½?æ?¯è¿?ç§?设å¤?ç?? -backendå¿?é¡»é??ç½®æ??ç§? â??ghostâ?? 交æ?¢è®¾å¤?ï¼?并确ä¿?å®?æ°¸è¿?ä¸?ä¼?被使ç?¨ã?? - - -* 为ä»?ä¹?ä¼?æ??è¿?ç§?å?³äº? â??é??å¤?å?å?¨â?? ç??å¥?æ?ªå®?ä¹?ï¼?å¦?æ??ä¸?个页é?¢ä»¥å??被æ??å??å?°å?å?¨è¿?ï¼? - é?¾é??å®?ä¸?è?½æ?»æ?¯è¢«æ??å??å?°è¦?ç??å??ï¼? - -å? ä¹?æ?»æ?¯å?¯ä»¥ç??ï¼?ä¸?ï¼?æ??æ?¶ä¸?è?½ã??è??è??ä¸?个ä¾?å?ï¼?æ?°æ?®è¢«å??缩äº?ï¼?å??æ?¥ç??4K页é?¢è¢«å?? -缩å?°äº?1Kã??ç?°å?¨ï¼?æ??人è¯?å?¾ç?¨ä¸?å?¯å??缩ç??æ?°æ?®è¦?ç??该页ï¼?å? æ¤ä¼?å? ç?¨æ?´ä¸ª4Kã??ä½?æ?¯ -backend没æ??æ?´å¤?ç??空é?´äº?ã??å?¨è¿?ç§?æ??å?µä¸?ï¼?è¿?个å?å?¨å¿?须被æ??ç»?ã??æ¯?å½?frontswap -æ??ç»?ä¸?个ä¼?è¦?ç??ç??å?å?¨æ?¶ï¼?å®?ä¹?å¿?须使æ?§ç??æ?°æ?®ä½?åº?ï¼?并确ä¿?å®?ä¸?å??被访é?®ã??å? 为交 -æ?¢å?ç³»ç»?ä¼?æ??æ?°ç??æ?°æ?®å??å?°è¯»äº¤æ?¢è®¾å¤?ä¸?ï¼?è¿?æ?¯ç¡®ä¿?ä¸?è?´æ?§ç??æ£ç¡®å??æ³?ã?? - -* 为ä»?ä¹?frontswapè¡¥ä¸?ä¼?å??建æ?°ç??头æ??件swapfile.hï¼? - -frontswap代ç ?ä¾?èµ?äº?ä¸?äº?swapå?ç³»ç»?å??é?¨ç??æ?°æ?®ç»?æ??ï¼?è¿?äº?æ?°æ?®ç»?æ??å¤?å¹´æ?¥ä¸?ç?´ -å?¨é??æ??å??å?¨å±?ä¹?é?´æ?¥å??移å?¨ã??è¿?ä¼¼ä¹?æ?¯ä¸?个å??ç??ç??妥å??ï¼?å°?å®?们å®?ä¹?为å?¨å±?ï¼?ä½?å?¨ä¸? -个æ?°ç??å??å?«æ??件ä¸å£°æ??å®?们ï¼?该æ??件ä¸?被å??å?«swap.hç??大é??æº?æ??件æ??å??å?«ã?? - -Dan Magenheimerï¼?æ??å??æ?´æ?°äº?2012å¹´4æ??9æ?¥ --- a/Documentation/translations/zh_CN/mm/index.rst~mm-kill-frontswap +++ a/Documentation/translations/zh_CN/mm/index.rst @@ -42,7 +42,6 @@ ç»?æ??å??ç??æ??æ¡£ä¸ï¼?å¦?æ??å®?å·²ç»?å damon/index free_page_reporting ksm - frontswap hmm hwpoison hugetlbfs_reserv --- a/fs/proc/meminfo.c~mm-kill-frontswap +++ a/fs/proc/meminfo.c @@ -17,6 +17,7 @@ #ifdef CONFIG_CMA #include <linux/cma.h> #endif +#include <linux/zswap.h> #include <asm/page.h> #include "internal.h" --- a/include/linux/frontswap.h +++ /dev/null @@ -1,91 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_FRONTSWAP_H -#define _LINUX_FRONTSWAP_H - -#include <linux/swap.h> -#include <linux/mm.h> -#include <linux/bitops.h> -#include <linux/jump_label.h> - -struct frontswap_ops { - void (*init)(unsigned); /* this swap type was just swapon'ed */ - int (*store)(unsigned, pgoff_t, struct page *); /* store a page */ - int (*load)(unsigned, pgoff_t, struct page *, bool *); /* load a page */ - void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ - void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ -}; - -int frontswap_register_ops(const struct frontswap_ops *ops); - -extern void frontswap_init(unsigned type, unsigned long *map); -extern int __frontswap_store(struct page *page); -extern int __frontswap_load(struct page *page); -extern void __frontswap_invalidate_page(unsigned, pgoff_t); -extern void __frontswap_invalidate_area(unsigned); - -#ifdef CONFIG_FRONTSWAP -extern struct static_key_false frontswap_enabled_key; - -static inline bool frontswap_enabled(void) -{ - return static_branch_unlikely(&frontswap_enabled_key); -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ - p->frontswap_map = map; -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return p->frontswap_map; -} -#else -/* all inline routines become no-ops and all externs are ignored */ - -static inline bool frontswap_enabled(void) -{ - return false; -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return NULL; -} -#endif - -static inline int frontswap_store(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_store(page); - - return -1; -} - -static inline int frontswap_load(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_load(page); - - return -1; -} - -static inline void frontswap_invalidate_page(unsigned type, pgoff_t offset) -{ - if (frontswap_enabled()) - __frontswap_invalidate_page(type, offset); -} - -static inline void frontswap_invalidate_area(unsigned type) -{ - if (frontswap_enabled()) - __frontswap_invalidate_area(type); -} - -#endif /* _LINUX_FRONTSWAP_H */ --- a/include/linux/swapfile.h~mm-kill-frontswap +++ a/include/linux/swapfile.h @@ -2,11 +2,6 @@ #ifndef _LINUX_SWAPFILE_H #define _LINUX_SWAPFILE_H -/* - * these were static in swapfile.c but frontswap.c needs them and we don't - * want to expose them to the dozens of source files that include swap.h - */ -extern struct swap_info_struct *swap_info[]; extern unsigned long generic_max_swapfile_size(void); unsigned long arch_max_swapfile_size(void); --- a/include/linux/swap.h~mm-kill-frontswap +++ a/include/linux/swap.h @@ -302,10 +302,6 @@ struct swap_info_struct { struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ struct completion comp; /* seldom referenced */ -#ifdef CONFIG_FRONTSWAP - unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ - atomic_t frontswap_pages; /* frontswap pages in-use counter */ -#endif spinlock_t lock; /* * protect map scan related fields like * swap_map, lowest_bit, highest_bit, @@ -630,11 +626,6 @@ static inline int mem_cgroup_swappiness( } #endif -#ifdef CONFIG_ZSWAP -extern u64 zswap_pool_total_size; -extern atomic_t zswap_stored_pages; -#endif - #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp); static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) --- /dev/null +++ a/include/linux/zswap.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_ZSWAP_H +#define _LINUX_ZSWAP_H + +#include <linux/types.h> +#include <linux/mm_types.h> + +extern u64 zswap_pool_total_size; +extern atomic_t zswap_stored_pages; + +#ifdef CONFIG_ZSWAP + +bool zswap_store(struct page *page); +bool zswap_load(struct page *page); +void zswap_invalidate(int type, pgoff_t offset); +void zswap_swapon(int type); +void zswap_swapoff(int type); + +#else + +static inline bool zswap_store(struct page *page) +{ + return false; +} + +static inline bool zswap_load(struct page *page) +{ + return false; +} + +static inline void zswap_invalidate(int type, pgoff_t offset) {} +static inline void zswap_swapon(int type) {} +static inline void zswap_swapoff(int type) {} + +#endif + +#endif /* _LINUX_ZSWAP_H */ --- a/MAINTAINERS~mm-kill-frontswap +++ a/MAINTAINERS @@ -8394,13 +8394,6 @@ F: Documentation/power/freezing-of-tasks F: include/linux/freezer.h F: kernel/freezer.c -FRONTSWAP API -M: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> -L: linux-kernel@xxxxxxxxxxxxxxx -S: Maintained -F: include/linux/frontswap.h -F: mm/frontswap.c - FS-CACHE: LOCAL CACHING FOR NETWORK FILESYSTEMS M: David Howells <dhowells@xxxxxxxxxx> L: linux-cachefs@xxxxxxxxxx (moderated for non-subscribers) --- a/mm/frontswap.c +++ /dev/null @@ -1,283 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Frontswap frontend - * - * This code provides the generic "frontend" layer to call a matching - * "backend" driver implementation of frontswap. See - * Documentation/mm/frontswap.rst for more information. - * - * Copyright (C) 2009-2012 Oracle Corp. All rights reserved. - * Author: Dan Magenheimer - */ - -#include <linux/mman.h> -#include <linux/swap.h> -#include <linux/swapops.h> -#include <linux/security.h> -#include <linux/module.h> -#include <linux/debugfs.h> -#include <linux/frontswap.h> -#include <linux/swapfile.h> - -DEFINE_STATIC_KEY_FALSE(frontswap_enabled_key); - -/* - * frontswap_ops are added by frontswap_register_ops, and provide the - * frontswap "backend" implementation functions. Multiple implementations - * may be registered, but implementations can never deregister. This - * is a simple singly-linked list of all registered implementations. - */ -static const struct frontswap_ops *frontswap_ops __read_mostly; - -#ifdef CONFIG_DEBUG_FS -/* - * Counters available via /sys/kernel/debug/frontswap (if debugfs is - * properly configured). These are for information only so are not protected - * against increment races. - */ -static u64 frontswap_loads; -static u64 frontswap_succ_stores; -static u64 frontswap_failed_stores; -static u64 frontswap_invalidates; - -static inline void inc_frontswap_loads(void) -{ - data_race(frontswap_loads++); -} -static inline void inc_frontswap_succ_stores(void) -{ - data_race(frontswap_succ_stores++); -} -static inline void inc_frontswap_failed_stores(void) -{ - data_race(frontswap_failed_stores++); -} -static inline void inc_frontswap_invalidates(void) -{ - data_race(frontswap_invalidates++); -} -#else -static inline void inc_frontswap_loads(void) { } -static inline void inc_frontswap_succ_stores(void) { } -static inline void inc_frontswap_failed_stores(void) { } -static inline void inc_frontswap_invalidates(void) { } -#endif - -/* - * Due to the asynchronous nature of the backends loading potentially - * _after_ the swap system has been activated, we have chokepoints - * on all frontswap functions to not call the backend until the backend - * has registered. - * - * This would not guards us against the user deciding to call swapoff right as - * we are calling the backend to initialize (so swapon is in action). - * Fortunately for us, the swapon_mutex has been taken by the callee so we are - * OK. The other scenario where calls to frontswap_store (called via - * swap_writepage) is racing with frontswap_invalidate_area (called via - * swapoff) is again guarded by the swap subsystem. - * - * While no backend is registered all calls to frontswap_[store|load| - * invalidate_area|invalidate_page] are ignored or fail. - * - * The time between the backend being registered and the swap file system - * calling the backend (via the frontswap_* functions) is indeterminate as - * frontswap_ops is not atomic_t (or a value guarded by a spinlock). - * That is OK as we are comfortable missing some of these calls to the newly - * registered backend. - * - * Obviously the opposite (unloading the backend) must be done after all - * the frontswap_[store|load|invalidate_area|invalidate_page] start - * ignoring or failing the requests. However, there is currently no way - * to unload a backend once it is registered. - */ - -/* - * Register operations for frontswap - */ -int frontswap_register_ops(const struct frontswap_ops *ops) -{ - if (frontswap_ops) - return -EINVAL; - - frontswap_ops = ops; - static_branch_inc(&frontswap_enabled_key); - return 0; -} - -/* - * Called when a swap device is swapon'd. - */ -void frontswap_init(unsigned type, unsigned long *map) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(sis == NULL); - - /* - * p->frontswap is a bitmap that we MUST have to figure out which page - * has gone in frontswap. Without it there is no point of continuing. - */ - if (WARN_ON(!map)) - return; - /* - * Irregardless of whether the frontswap backend has been loaded - * before this function or it will be later, we _MUST_ have the - * p->frontswap set to something valid to work properly. - */ - frontswap_map_set(sis, map); - - if (!frontswap_enabled()) - return; - frontswap_ops->init(type); -} - -static bool __frontswap_test(struct swap_info_struct *sis, - pgoff_t offset) -{ - if (sis->frontswap_map) - return test_bit(offset, sis->frontswap_map); - return false; -} - -static inline void __frontswap_set(struct swap_info_struct *sis, - pgoff_t offset) -{ - set_bit(offset, sis->frontswap_map); - atomic_inc(&sis->frontswap_pages); -} - -static inline void __frontswap_clear(struct swap_info_struct *sis, - pgoff_t offset) -{ - clear_bit(offset, sis->frontswap_map); - atomic_dec(&sis->frontswap_pages); -} - -/* - * "Store" data from a page to frontswap and associate it with the page's - * swaptype and offset. Page must be locked and in the swap cache. - * If frontswap already contains a page with matching swaptype and - * offset, the frontswap implementation may either overwrite the data and - * return success or invalidate the page from frontswap and return failure. - */ -int __frontswap_store(struct page *page) -{ - int ret = -1; - swp_entry_t entry = { .val = page_private(page), }; - int type = swp_type(entry); - struct swap_info_struct *sis = swap_info[type]; - pgoff_t offset = swp_offset(entry); - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(sis == NULL); - - /* - * If a dup, we must remove the old page first; we can't leave the - * old page no matter if the store of the new page succeeds or fails, - * and we can't rely on the new page replacing the old page as we may - * not store to the same implementation that contains the old page. - */ - if (__frontswap_test(sis, offset)) { - __frontswap_clear(sis, offset); - frontswap_ops->invalidate_page(type, offset); - } - - ret = frontswap_ops->store(type, offset, page); - if (ret == 0) { - __frontswap_set(sis, offset); - inc_frontswap_succ_stores(); - } else { - inc_frontswap_failed_stores(); - } - - return ret; -} - -/* - * "Get" data from frontswap associated with swaptype and offset that were - * specified when the data was put to frontswap and use it to fill the - * specified page with data. Page must be locked and in the swap cache. - */ -int __frontswap_load(struct page *page) -{ - int ret = -1; - swp_entry_t entry = { .val = page_private(page), }; - int type = swp_type(entry); - struct swap_info_struct *sis = swap_info[type]; - pgoff_t offset = swp_offset(entry); - bool exclusive = false; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(sis == NULL); - - if (!__frontswap_test(sis, offset)) - return -1; - - /* Try loading from each implementation, until one succeeds. */ - ret = frontswap_ops->load(type, offset, page, &exclusive); - if (ret == 0) { - inc_frontswap_loads(); - if (exclusive) { - SetPageDirty(page); - __frontswap_clear(sis, offset); - } - } - return ret; -} - -/* - * Invalidate any data from frontswap associated with the specified swaptype - * and offset so that a subsequent "get" will fail. - */ -void __frontswap_invalidate_page(unsigned type, pgoff_t offset) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(sis == NULL); - - if (!__frontswap_test(sis, offset)) - return; - - frontswap_ops->invalidate_page(type, offset); - __frontswap_clear(sis, offset); - inc_frontswap_invalidates(); -} - -/* - * Invalidate all data from frontswap associated with all offsets for the - * specified swaptype. - */ -void __frontswap_invalidate_area(unsigned type) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(sis == NULL); - - if (sis->frontswap_map == NULL) - return; - - frontswap_ops->invalidate_area(type); - atomic_set(&sis->frontswap_pages, 0); - bitmap_zero(sis->frontswap_map, sis->max); -} - -static int __init init_frontswap(void) -{ -#ifdef CONFIG_DEBUG_FS - struct dentry *root = debugfs_create_dir("frontswap", NULL); - if (root == NULL) - return -ENXIO; - debugfs_create_u64("loads", 0444, root, &frontswap_loads); - debugfs_create_u64("succ_stores", 0444, root, &frontswap_succ_stores); - debugfs_create_u64("failed_stores", 0444, root, - &frontswap_failed_stores); - debugfs_create_u64("invalidates", 0444, root, &frontswap_invalidates); -#endif - return 0; -} - -module_init(init_frontswap); --- a/mm/Kconfig~mm-kill-frontswap +++ a/mm/Kconfig @@ -25,7 +25,6 @@ menuconfig SWAP config ZSWAP bool "Compressed cache for swap pages" depends on SWAP - select FRONTSWAP select CRYPTO select ZPOOL help @@ -870,9 +869,6 @@ config USE_PERCPU_NUMA_NODE_ID config HAVE_SETUP_PER_CPU_AREA bool -config FRONTSWAP - bool - config CMA bool "Contiguous Memory Allocator" depends on MMU --- a/mm/Makefile~mm-kill-frontswap +++ a/mm/Makefile @@ -72,7 +72,6 @@ ifdef CONFIG_MMU endif obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o -obj-$(CONFIG_FRONTSWAP) += frontswap.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o --- a/mm/page_io.c~mm-kill-frontswap +++ a/mm/page_io.c @@ -19,12 +19,12 @@ #include <linux/bio.h> #include <linux/swapops.h> #include <linux/writeback.h> -#include <linux/frontswap.h> #include <linux/blkdev.h> #include <linux/psi.h> #include <linux/uio.h> #include <linux/sched/task.h> #include <linux/delayacct.h> +#include <linux/zswap.h> #include "swap.h" static void __end_swap_bio_write(struct bio *bio) @@ -198,7 +198,7 @@ int swap_writepage(struct page *page, st folio_unlock(folio); return ret; } - if (frontswap_store(&folio->page) == 0) { + if (zswap_store(&folio->page)) { folio_start_writeback(folio); folio_unlock(folio); folio_end_writeback(folio); @@ -515,7 +515,7 @@ void swap_readpage(struct page *page, bo } delayacct_swapin_start(); - if (frontswap_load(page) == 0) { + if (zswap_load(page)) { SetPageUptodate(page); unlock_page(page); } else if (data_race(sis->flags & SWP_FS_OPS)) { --- a/mm/swapfile.c~mm-kill-frontswap +++ a/mm/swapfile.c @@ -35,13 +35,13 @@ #include <linux/memcontrol.h> #include <linux/poll.h> #include <linux/oom.h> -#include <linux/frontswap.h> #include <linux/swapfile.h> #include <linux/export.h> #include <linux/swap_slots.h> #include <linux/sort.h> #include <linux/completion.h> #include <linux/suspend.h> +#include <linux/zswap.h> #include <asm/tlbflush.h> #include <linux/swapops.h> @@ -95,7 +95,7 @@ static PLIST_HEAD(swap_active_head); static struct plist_head *swap_avail_heads; static DEFINE_SPINLOCK(swap_avail_lock); -struct swap_info_struct *swap_info[MAX_SWAPFILES]; +static struct swap_info_struct *swap_info[MAX_SWAPFILES]; static DEFINE_MUTEX(swapon_mutex); @@ -744,7 +744,7 @@ static void swap_range_free(struct swap_ swap_slot_free_notify = NULL; while (offset <= end) { arch_swap_invalidate_page(si->type, offset); - frontswap_invalidate_page(si->type, offset); + zswap_invalidate(si->type, offset); if (swap_slot_free_notify) swap_slot_free_notify(si->bdev, offset); offset++; @@ -2343,11 +2343,10 @@ static void _enable_swap_info(struct swa static void enable_swap_info(struct swap_info_struct *p, int prio, unsigned char *swap_map, - struct swap_cluster_info *cluster_info, - unsigned long *frontswap_map) + struct swap_cluster_info *cluster_info) { - if (IS_ENABLED(CONFIG_FRONTSWAP)) - frontswap_init(p->type, frontswap_map); + zswap_swapon(p->type); + spin_lock(&swap_lock); spin_lock(&p->lock); setup_swap_info(p, prio, swap_map, cluster_info); @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE1(swapoff, const char __us struct swap_info_struct *p = NULL; unsigned char *swap_map; struct swap_cluster_info *cluster_info; - unsigned long *frontswap_map; struct file *swap_file, *victim; struct address_space *mapping; struct inode *inode; @@ -2515,12 +2513,10 @@ SYSCALL_DEFINE1(swapoff, const char __us p->swap_map = NULL; cluster_info = p->cluster_info; p->cluster_info = NULL; - frontswap_map = frontswap_map_get(p); spin_unlock(&p->lock); spin_unlock(&swap_lock); arch_swap_invalidate_area(p->type); - frontswap_invalidate_area(p->type); - frontswap_map_set(p, NULL); + zswap_swapoff(p->type); mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; @@ -2528,7 +2524,6 @@ SYSCALL_DEFINE1(swapoff, const char __us p->cluster_next_cpu = NULL; vfree(swap_map); kvfree(cluster_info); - kvfree(frontswap_map); /* Destroy swap account information */ swap_cgroup_swapoff(p->type); exit_swap_address_space(p->type); @@ -2995,7 +2990,6 @@ SYSCALL_DEFINE2(swapon, const char __use unsigned long maxpages; unsigned char *swap_map = NULL; struct swap_cluster_info *cluster_info = NULL; - unsigned long *frontswap_map = NULL; struct page *page = NULL; struct inode *inode = NULL; bool inced_nr_rotate_swap = false; @@ -3135,11 +3129,6 @@ SYSCALL_DEFINE2(swapon, const char __use error = nr_extents; goto bad_swap_unlock_inode; } - /* frontswap enabled? set up bit-per-page map for frontswap */ - if (IS_ENABLED(CONFIG_FRONTSWAP)) - frontswap_map = kvcalloc(BITS_TO_LONGS(maxpages), - sizeof(long), - GFP_KERNEL); if ((swap_flags & SWAP_FLAG_DISCARD) && p->bdev && bdev_max_discard_sectors(p->bdev)) { @@ -3192,16 +3181,15 @@ SYSCALL_DEFINE2(swapon, const char __use if (swap_flags & SWAP_FLAG_PREFER) prio = (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; - enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); + enable_swap_info(p, prio, swap_map, cluster_info); - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s\n", p->pages<<(PAGE_SHIFT-10), name->name, p->prio, nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), (p->flags & SWP_SOLIDSTATE) ? "SS" : "", (p->flags & SWP_DISCARDABLE) ? "D" : "", (p->flags & SWP_AREA_DISCARD) ? "s" : "", - (p->flags & SWP_PAGE_DISCARD) ? "c" : "", - (frontswap_map) ? "FS" : ""); + (p->flags & SWP_PAGE_DISCARD) ? "c" : ""); mutex_unlock(&swapon_mutex); atomic_inc(&proc_poll_event); @@ -3231,7 +3219,6 @@ bad_swap: spin_unlock(&swap_lock); vfree(swap_map); kvfree(cluster_info); - kvfree(frontswap_map); if (inced_nr_rotate_swap) atomic_dec(&nr_rotate_swap); if (swap_file) --- a/mm/zswap.c~mm-kill-frontswap +++ a/mm/zswap.c @@ -2,7 +2,7 @@ /* * zswap.c - zswap driver file * - * zswap is a backend for frontswap that takes pages that are in the process + * zswap is a cache that takes pages that are in the process * of being swapped out and attempts to compress and store them in a * RAM-based memory pool. This can result in a significant I/O reduction on * the swap device and, in the case where decompressing from RAM is faster @@ -20,7 +20,6 @@ #include <linux/spinlock.h> #include <linux/types.h> #include <linux/atomic.h> -#include <linux/frontswap.h> #include <linux/rbtree.h> #include <linux/swap.h> #include <linux/crypto.h> @@ -28,7 +27,7 @@ #include <linux/mempool.h> #include <linux/zpool.h> #include <crypto/acompress.h> - +#include <linux/zswap.h> #include <linux/mm_types.h> #include <linux/page-flags.h> #include <linux/swapops.h> @@ -1084,7 +1083,7 @@ static int zswap_get_swap_cache_page(swp * * This can be thought of as a "resumed writeback" of the page * to the swap device. We are basically resuming the same swap - * writeback path that was intercepted with the frontswap_store() + * writeback path that was intercepted with the zswap_store() * in the first place. After the page has been decompressed into * the swap cache, the compressed version stored by zswap can be * freed. @@ -1224,13 +1223,11 @@ static void zswap_fill_page(void *ptr, u memset_l(page, value, PAGE_SIZE / sizeof(unsigned long)); } -/********************************* -* frontswap hooks -**********************************/ -/* attempts to compress and store an single page */ -static int zswap_frontswap_store(unsigned type, pgoff_t offset, - struct page *page) +bool zswap_store(struct page *page) { + swp_entry_t swp = { .val = page_private(page), }; + int type = swp_type(swp); + pgoff_t offset = swp_offset(swp); struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry, *dupentry; struct scatterlist input, output; @@ -1238,23 +1235,21 @@ static int zswap_frontswap_store(unsigne struct obj_cgroup *objcg = NULL; struct zswap_pool *pool; struct zpool *zpool; - int ret; unsigned int dlen = PAGE_SIZE; unsigned long handle, value; char *buf; u8 *src, *dst; gfp_t gfp; + int ret; - /* THP isn't supported */ - if (PageTransHuge(page)) { - ret = -EINVAL; - goto reject; - } + VM_WARN_ON_ONCE(!PageLocked(page)); + VM_WARN_ON_ONCE(!PageSwapCache(page)); - if (!zswap_enabled || !tree) { - ret = -ENODEV; - goto reject; - } + if (PageTransHuge(page)) + return false; + + if (!zswap_enabled || !tree) + return false; /* * XXX: zswap reclaim does not work with cgroups yet. Without a @@ -1262,10 +1257,8 @@ static int zswap_frontswap_store(unsigne * local cgroup limits. */ objcg = get_obj_cgroup_from_page(page); - if (objcg && !obj_cgroup_may_zswap(objcg)) { - ret = -ENOMEM; + if (objcg && !obj_cgroup_may_zswap(objcg)) goto reject; - } /* reclaim space if needed */ if (zswap_is_full()) { @@ -1275,10 +1268,9 @@ static int zswap_frontswap_store(unsigne } if (zswap_pool_reached_full) { - if (!zswap_can_accept()) { - ret = -ENOMEM; + if (!zswap_can_accept()) goto shrink; - } else + else zswap_pool_reached_full = false; } @@ -1286,7 +1278,6 @@ static int zswap_frontswap_store(unsigne entry = zswap_entry_cache_alloc(GFP_KERNEL); if (!entry) { zswap_reject_kmemcache_fail++; - ret = -ENOMEM; goto reject; } @@ -1303,17 +1294,13 @@ static int zswap_frontswap_store(unsigne kunmap_atomic(src); } - if (!zswap_non_same_filled_pages_enabled) { - ret = -EINVAL; + if (!zswap_non_same_filled_pages_enabled) goto freepage; - } /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); - if (!entry->pool) { - ret = -EINVAL; + if (!entry->pool) goto freepage; - } /* compress */ acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); @@ -1333,19 +1320,17 @@ static int zswap_frontswap_store(unsigne * synchronous in fact. * Theoretically, acomp supports users send multiple acomp requests in one * acomp instance, then get those requests done simultaneously. but in this - * case, frontswap actually does store and load page by page, there is no + * case, zswap actually does store and load page by page, there is no * existing method to send the second page before the first page is done - * in one thread doing frontswap. + * in one thread doing zwap. * but in different threads running on different cpu, we have different * acomp instance, so multiple threads can do (de)compression in parallel. */ ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; - if (ret) { - ret = -EINVAL; + if (ret) goto put_dstmem; - } /* store */ zpool = zswap_find_zpool(entry); @@ -1381,15 +1366,12 @@ insert_entry: /* map */ spin_lock(&tree->lock); - do { - ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry); - if (ret == -EEXIST) { - zswap_duplicate_entry++; - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, dupentry); - zswap_entry_put(tree, dupentry); - } - } while (ret == -EEXIST); + while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { + zswap_duplicate_entry++; + /* remove from rbtree */ + zswap_rb_erase(&tree->rbroot, dupentry); + zswap_entry_put(tree, dupentry); + } if (entry->length) { spin_lock(&entry->pool->lru_lock); list_add(&entry->lru, &entry->pool->lru); @@ -1402,7 +1384,7 @@ insert_entry: zswap_update_total_size(); count_vm_event(ZSWPOUT); - return 0; + return true; put_dstmem: mutex_unlock(acomp_ctx->mutex); @@ -1412,23 +1394,20 @@ freepage: reject: if (objcg) obj_cgroup_put(objcg); - return ret; + return false; shrink: pool = zswap_pool_last_get(); if (pool) queue_work(shrink_wq, &pool->shrink_work); - ret = -ENOMEM; goto reject; } -/* - * returns 0 if the page was successfully decompressed - * return -1 on entry not found or error -*/ -static int zswap_frontswap_load(unsigned type, pgoff_t offset, - struct page *page, bool *exclusive) +bool zswap_load(struct page *page) { + swp_entry_t swp = { .val = page_private(page), }; + int type = swp_type(swp); + pgoff_t offset = swp_offset(swp); struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry; struct scatterlist input, output; @@ -1436,7 +1415,10 @@ static int zswap_frontswap_load(unsigned u8 *src, *dst, *tmp; struct zpool *zpool; unsigned int dlen; - int ret; + bool ret; + + VM_WARN_ON_ONCE(!PageLocked(page)); + VM_WARN_ON_ONCE(!PageSwapCache(page)); /* find */ spin_lock(&tree->lock); @@ -1444,7 +1426,7 @@ static int zswap_frontswap_load(unsigned if (!entry) { /* entry was written back */ spin_unlock(&tree->lock); - return -1; + return false; } spin_unlock(&tree->lock); @@ -1452,7 +1434,7 @@ static int zswap_frontswap_load(unsigned dst = kmap_atomic(page); zswap_fill_page(dst, entry->value); kunmap_atomic(dst); - ret = 0; + ret = true; goto stats; } @@ -1460,7 +1442,7 @@ static int zswap_frontswap_load(unsigned if (!zpool_can_sleep_mapped(zpool)) { tmp = kmalloc(entry->length, GFP_KERNEL); if (!tmp) { - ret = -ENOMEM; + ret = false; goto freeentry; } } @@ -1481,7 +1463,8 @@ static int zswap_frontswap_load(unsigned sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); - ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)) + WARN_ON(1); mutex_unlock(acomp_ctx->mutex); if (zpool_can_sleep_mapped(zpool)) @@ -1489,16 +1472,16 @@ static int zswap_frontswap_load(unsigned else kfree(tmp); - BUG_ON(ret); + ret = true; stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); freeentry: spin_lock(&tree->lock); - if (!ret && zswap_exclusive_loads_enabled) { + if (ret && zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); - *exclusive = true; + SetPageDirty(page); } else if (entry->length) { spin_lock(&entry->pool->lru_lock); list_move(&entry->lru, &entry->pool->lru); @@ -1510,8 +1493,7 @@ freeentry: return ret; } -/* frees an entry in zswap */ -static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) +void zswap_invalidate(int type, pgoff_t offset) { struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry; @@ -1528,8 +1510,22 @@ static void zswap_frontswap_invalidate_p spin_unlock(&tree->lock); } -/* frees all zswap entries for the given swap type */ -static void zswap_frontswap_invalidate_area(unsigned type) +void zswap_swapon(int type) +{ + struct zswap_tree *tree; + + tree = kzalloc(sizeof(*tree), GFP_KERNEL); + if (!tree) { + pr_err("alloc failed, zswap disabled for swap type %d\n", type); + return; + } + + tree->rbroot = RB_ROOT; + spin_lock_init(&tree->lock); + zswap_trees[type] = tree; +} + +void zswap_swapoff(int type) { struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry, *n; @@ -1547,29 +1543,6 @@ static void zswap_frontswap_invalidate_a zswap_trees[type] = NULL; } -static void zswap_frontswap_init(unsigned type) -{ - struct zswap_tree *tree; - - tree = kzalloc(sizeof(*tree), GFP_KERNEL); - if (!tree) { - pr_err("alloc failed, zswap disabled for swap type %d\n", type); - return; - } - - tree->rbroot = RB_ROOT; - spin_lock_init(&tree->lock); - zswap_trees[type] = tree; -} - -static const struct frontswap_ops zswap_frontswap_ops = { - .store = zswap_frontswap_store, - .load = zswap_frontswap_load, - .invalidate_page = zswap_frontswap_invalidate_page, - .invalidate_area = zswap_frontswap_invalidate_area, - .init = zswap_frontswap_init -}; - /********************************* * debugfs functions **********************************/ @@ -1658,16 +1631,11 @@ static int zswap_setup(void) if (!shrink_wq) goto fallback_fail; - ret = frontswap_register_ops(&zswap_frontswap_ops); - if (ret) - goto destroy_wq; if (zswap_debugfs_init()) pr_warn("debugfs initialization failed\n"); zswap_init_state = ZSWAP_INIT_SUCCEED; return 0; -destroy_wq: - destroy_workqueue(shrink_wq); fallback_fail: if (pool) zswap_pool_destroy(pool); _ Patches currently in -mm which might be from hannes@xxxxxxxxxxx are mm-kill-frontswap.patch