On Mon, Sep 29, 2014 at 11:41:45AM -0400, Dan Streetman wrote: > On Fri, Sep 26, 2014 at 2:53 AM, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote: > > WARNING: This is just RFC patchset. patch 2/2 is only for testing. > > If you know useful place to use this allocator, please let me know. > > > > This is brand-new allocator, called anti-fragmentation memory allocator > > (aka afmalloc), in order to deal with arbitrary sized object allocation > > efficiently. zram and zswap uses arbitrary sized object to store > > compressed data so they can use this allocator. If there are any other > > use cases, they can use it, too. > > > > This work is motivated by observation of fragmentation on zsmalloc which > > intended for storing arbitrary sized object with low fragmentation. > > Although it works well on allocation-intensive workload, memory could be > > highly fragmented after many free occurs. In some cases, unused memory due > > to fragmentation occupy 20% ~ 50% amount of real used memory. The other > > problem is that other subsystem cannot use these unused memory. These > > fragmented memory are zsmalloc specific, so most of other subsystem cannot > > use it until zspage is freed to page allocator. > > > > I guess that there are similar fragmentation problem in zbud, but, I > > didn't deeply investigate it. > > > > This new allocator uses SLAB allocator to solve above problems. When > > request comes, it returns handle that is pointer of metatdata to point > > many small chunks. These small chunks are in power of 2 size and > > build up whole requested memory. We can easily acquire these chunks > > using SLAB allocator. Following is conceptual represetation of metadata > > used in this allocator to help understanding of this allocator. > > > > Handle A for 400 bytes > > { > > Pointer for 256 bytes chunk > > Pointer for 128 bytes chunk > > Pointer for 16 bytes chunk > > > > (256 + 128 + 16 = 400) > > } > > > > As you can see, 400 bytes memory are not contiguous in afmalloc so that > > allocator specific store/load functions are needed. These require some > > computation overhead and I guess that this is the only drawback this > > allocator has. > > This also requires additional memory copying, for each map/unmap, no? Indeed. > > > > > For optimization, it uses another approach for power of 2 sized request. > > Instead of returning handle for metadata, it adds tag on pointer from > > SLAB allocator and directly returns this value as handle. With this tag, > > afmalloc can recognize whether handle is for metadata or not and do proper > > processing on it. This optimization can save some memory. > > > > Although afmalloc use some memory for metadata, overall utilization of > > memory is really good due to zero internal fragmentation by using power > > of 2 sized object. Although zsmalloc has many size class, there is > > considerable internal fragmentation in zsmalloc. > > > > In workload that needs many free, memory could be fragmented like > > zsmalloc, but, there is big difference. These unused portion of memory > > are SLAB specific memory so that other subsystem can use it. Therefore, > > fragmented memory could not be a big problem in this allocator. > > > > Extra benefit of this allocator design is NUMA awareness. This allocator > > allocates real memory from SLAB allocator. SLAB considers client's NUMA > > affinity, so these allocated memory is NUMA-friendly. Currently, zsmalloc > > and zbud which are backend of zram and zswap, respectively, are not NUMA > > awareness so that remote node's memory could be returned to requestor. > > I think that it could be solved easily if NUMA awareness turns out to be > > real problem. But, it may enlarge fragmentation depending on number of > > nodes. Anyway, there is no NUMA awareness issue in this allocator. > > > > Although I'd like to replace zsmalloc with this allocator, it cannot be > > possible, because zsmalloc supports HIGHMEM. In 32-bits world, SLAB memory > > would be very limited so supporting HIGHMEM would be really good advantage > > of zsmalloc. Because there is no HIGHMEM in 32-bits low memory device or > > 64-bits world, this allocator may be good option for this system. I > > didn't deeply consider whether this allocator can replace zbud or not. > > While it looks like there may be some situations that benefit from > this, this won't work for all cases (as you mention), so maybe zpool > can allow zram to choose between zsmalloc and afmalloc. Yes. :) > > > > Below is the result of my simple test. > > (zsmalloc used in experiments is patched with my previous patch: > > zsmalloc: merge size_class to reduce fragmentation) > > > > TEST ENV: EXT4 on zram, mount with discard option > > WORKLOAD: untar kernel source, remove dir in descending order in size. > > (drivers arch fs sound include) > > > > Each line represents orig_data_size, compr_data_size, mem_used_total, > > fragmentation overhead (mem_used - compr_data_size) and overhead ratio > > (overhead to compr_data_size), respectively, after untar and remove > > operation is executed. In afmalloc case, overhead is calculated by > > before/after 'SUnreclaim' on /proc/meminfo. > > And there are two more columns > > in afmalloc, one is real_overhead which represents metadata usage and > > overhead of internal fragmentation, and the other is a ratio, > > real_overhead to compr_data_size. Unlike zsmalloc, only metadata and > > internal fragmented memory cannot be used by other subsystem. So, > > comparing real_overhead in afmalloc with overhead on zsmalloc seems to > > be proper comparison. > > > > * untar-merge.out > > > > orig_size compr_size used_size overhead overhead_ratio > > 526.23MB 199.18MB 209.81MB 10.64MB 5.34% > > 288.68MB 97.45MB 104.08MB 6.63MB 6.80% > > 177.68MB 61.14MB 66.93MB 5.79MB 9.47% > > 146.83MB 47.34MB 52.79MB 5.45MB 11.51% > > 124.52MB 38.87MB 44.30MB 5.43MB 13.96% > > 104.29MB 31.70MB 36.83MB 5.13MB 16.19% > > > > * untar-afmalloc.out > > > > orig_size compr_size used_size overhead overhead_ratio real real-ratio > > 526.27MB 199.18MB 206.37MB 8.00MB 4.02% 7.19MB 3.61% > > 288.71MB 97.45MB 101.25MB 5.86MB 6.01% 3.80MB 3.90% > > 177.71MB 61.14MB 63.44MB 4.39MB 7.19% 2.30MB 3.76% > > 146.86MB 47.34MB 49.20MB 3.97MB 8.39% 1.86MB 3.93% > > 124.55MB 38.88MB 40.41MB 3.71MB 9.54% 1.53MB 3.95% > > 104.32MB 31.70MB 32.96MB 3.43MB 10.81% 1.26MB 3.96% > > > > As you can see above result, real_overhead_ratio in afmalloc is > > just 3% ~ 4% while overhead_ratio on zsmalloc varies 5% ~ 17%. > > > > And, 4% ~ 11% overhead_ratio in afmalloc is also slightly better > > than overhead_ratio in zsmalloc which is 5% ~ 17%. > > I think the key will be scaling up this test more. What does it look > like when using 20G or more? In fact, main usage type of zram, that is, zram-swap, doesn't use 20G memory in normal case. But, I also wanna know how it is scalable. I will do this kinds of some testing if possible. > > It certainly looks better when using (relatively) small amounts of data, though. Yes. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>