Hi Minchan (and all) -- I promised you that after the window closed, I would write up my concerns about zsmalloc. My preference would be to use zsmalloc, but there are definitely tradeoffs and my objective is to make zcache and RAMster ready for enterprise customers so I would use a different or captive allocator if these zsmalloc issues can't be overcome. Thanks, Dan === Zsmalloc is designed to maximize density of items that vary in size between 0<size<PAGE_SIZE, but especially when the mean item size significantly exceeds PAGE_SIZE/2. It is primarily useful when there are a large quantity of such items to be stored with little or no space wasted; if the quantity is small and/or some wasted space is acceptable, existing kernel allocators (e.g. slab) may be sufficient. In the case of zcache (and zram and ramster), where a large fraction of RAM is used to store zpages (lzo1x-compressed pages), zsmalloc seems to be a good match. It is unclear whether zsmalloc will ever have another user -- unless that user is also storing large quantities of compressed pages. Zcache is currently one primary user of zsmalloc, however zcache only uses zsmalloc for anonymous/swap ("frontswap") pages, not for file ("cleancache") pages. For file pages, zcache uses the captive "zbud" allocator; this is because zcache requires a shrinker for cleancache pages, by which entire pageframes can be easily reclaimed. Zsmalloc doesn't currently have shrinker capability and, because its storage patterns in and across physical pageframes are quite complex (to maximize density), an intelligent reclaim implementation may be difficult to design race-free. And implementing reclaim opaquely (i.e. while maintaining a clean layering) may be impossible. A good analogy might be linked-lists. Zsmalloc is like a singly-linked list (space-efficient but not as flexible) and zbud is like a doubly-linked list (not as space-efficient but more flexible). One has to choose the best data structure according to the functionality required. Some believe that the next step in zcache evolution will require shrinking of both frontswap and cleancache pages. Andrea has also stated that he thinks frontswap shrinking will be a must for any future KVM-tmem implementation. But preliminary investigations indicate that pageframe reclaim of frontswap pages may be even more difficult with zsmalloc. Until this issue is resolved (either by an adequately working implementation of reclaim with zsmalloc or via demonstration that zcache reclaim is unnecessary), the future use of zsmalloc by zcache is cloudy. I'm currently rewriting zbud as a foundation to investigate some reclaim policy ideas that I think will be useful both for KVM and for making zcache "enterprise ready." When that is done, we will see if zsmalloc can achieve the same flexibility. A few related comments about these allocators and their users: Zsmalloc relies on some clever underlying virtual-to-physical mapping manipulations to ensure that its users can store and retrieve items. These manipulations are necessary on HIGHMEM processors, but the cost is unclear on non-HIGHMEM processors. (Manipulating TLB entries is not inexpensive.) For zcache, the overhead may be irrelevant as long as it is a small fraction of the cost of compression/decompression, but it is worth measuring (worst case) to verify. Zbud can implement efficient reclaim because no more than two items ever reside in the same pageframe and items never cross a pageframe boundary. While zbud storage is certainly less dense than zsmalloc, the density is probably sufficient if the size of items is bell-curve distributed with a mean size of PAGE_SIZE/2 (or slightly less). This is true for many workloads, but datasets where the vast majority of items exceed PAGE_SIZE/2 render zbud useless. Note, however, that zcache (due to its foundation on transcendent memory) currently implements an admission policy that rejects pages when extreme datasets are encountered. In other words, zbud would handle these workloads simply by rejecting the pages, resulting in performance no worse (approximately) than if zcache were not present. RAMster maintains data structures to both point to zpages that are local and remote. Remote pages are identified by a handle-like bit sequence while local pages are identified by a true pointer. (Note that ramster currently will not run on a HIGHMEM machine.) RAMster currently differentiates between the two via a hack: examining the LSB. If the LSB is set, it is a handle referring to a remote page. This works with xvmalloc and zbud but not with zsmalloc's opaque handle. A simple solution would require zsmalloc to reserve the LSB of the opaque handle as must-be-zero. Zram is actually a good match for current zsmalloc because its storage grows to a pre-set RAM maximum size and cannot shrink again. Reclaim is not possible without a massive redesign (and that redesign is essentially zcache). But as a result of its grow-but-never-shrink design, zram may have some significant performance implications on most workloads and system configurations. It remains to be seen if its niche usage will warrant promotion from the staging tree. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href