On 02/23/2013 06:37 PM, Ric Mason wrote: > On 02/23/2013 05:02 AM, Seth Jennings wrote: >> On 02/21/2013 08:56 PM, Ric Mason wrote: >>> On 02/21/2013 11:50 PM, Seth Jennings wrote: >>>> On 02/21/2013 02:49 AM, Ric Mason wrote: >>>>> On 02/19/2013 03:16 AM, Seth Jennings wrote: >>>>>> On 02/16/2013 12:21 AM, Ric Mason wrote: >>>>>>> On 02/14/2013 02:38 AM, Seth Jennings wrote: >>>>>>>> This patch adds a documentation file for zsmalloc at >>>>>>>> Documentation/vm/zsmalloc.txt >>>>>>>> >>>>>>>> Signed-off-by: Seth Jennings <sjenning@xxxxxxxxxxxxxxxxxx> >>>>>>>> --- >>>>>>>> Documentation/vm/zsmalloc.txt | 68 >>>>>>>> +++++++++++++++++++++++++++++++++++++++++ >>>>>>>> 1 file changed, 68 insertions(+) >>>>>>>> create mode 100644 Documentation/vm/zsmalloc.txt >>>>>>>> >>>>>>>> diff --git a/Documentation/vm/zsmalloc.txt >>>>>>>> b/Documentation/vm/zsmalloc.txt >>>>>>>> new file mode 100644 >>>>>>>> index 0000000..85aa617 >>>>>>>> --- /dev/null >>>>>>>> +++ b/Documentation/vm/zsmalloc.txt >>>>>>>> @@ -0,0 +1,68 @@ >>>>>>>> +zsmalloc Memory Allocator >>>>>>>> + >>>>>>>> +Overview >>>>>>>> + >>>>>>>> +zmalloc a new slab-based memory allocator, >>>>>>>> +zsmalloc, for storing compressed pages. It is designed for >>>>>>>> +low fragmentation and high allocation success rate on >>>>>>>> +large object, but <= PAGE_SIZE allocations. >>>>>>>> + >>>>>>>> +zsmalloc differs from the kernel slab allocator in two primary >>>>>>>> +ways to achieve these design goals. >>>>>>>> + >>>>>>>> +zsmalloc never requires high order page allocations to back >>>>>>>> +slabs, or "size classes" in zsmalloc terms. Instead it allows >>>>>>>> +multiple single-order pages to be stitched together into a >>>>>>>> +"zspage" which backs the slab. This allows for higher >>>>>>>> allocation >>>>>>>> +success rate under memory pressure. >>>>>>>> + >>>>>>>> +Also, zsmalloc allows objects to span page boundaries within the >>>>>>>> +zspage. This allows for lower fragmentation than could be had >>>>>>>> +with the kernel slab allocator for objects between PAGE_SIZE/2 >>>>>>>> +and PAGE_SIZE. With the kernel slab allocator, if a page >>>>>>>> compresses >>>>>>>> +to 60% of it original size, the memory savings gained through >>>>>>>> +compression is lost in fragmentation because another object of >>>>>>>> +the same size can't be stored in the leftover space. >>>>>>>> + >>>>>>>> +This ability to span pages results in zsmalloc allocations not >>>>>>>> being >>>>>>>> +directly addressable by the user. The user is given an >>>>>>>> +non-dereferencable handle in response to an allocation request. >>>>>>>> +That handle must be mapped, using zs_map_object(), which returns >>>>>>>> +a pointer to the mapped region that can be used. The mapping is >>>>>>>> +necessary since the object data may reside in two different >>>>>>>> +noncontigious pages. >>>>>>> Do you mean the reason of to use a zsmalloc object must map after >>>>>>> malloc is object data maybe reside in two different nocontiguous >>>>>>> pages? >>>>>> Yes, that is one reason for the mapping. The other reason (more >>>>>> of an >>>>>> added bonus) is below. >>>>>> >>>>>>>> + >>>>>>>> +For 32-bit systems, zsmalloc has the added benefit of being >>>>>>>> +able to back slabs with HIGHMEM pages, something not possible >>>>>>> What's the meaning of "back slabs with HIGHMEM pages"? >>>>>> By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit >>>>>> systems >>>>>> with larger that 1GB (actually a little less) of RAM. The upper >>>>>> 3GB >>>>>> of the 4GB address space, depending on kernel build options, is not >>>>>> directly addressable by the kernel, but can be mapped into the >>>>>> kernel >>>>>> address space with functions like kmap() or kmap_atomic(). >>>>>> >>>>>> These pages can't be used by slab/slub because they are not >>>>>> continuously mapped into the kernel address space. However, since >>>>>> zsmalloc requires a mapping anyway to handle objects that span >>>>>> non-contiguous page boundaries, we do the kernel mapping as part of >>>>>> the process. >>>>>> >>>>>> So zspages, the conceptual slab in zsmalloc backed by single-order >>>>>> pages can include pages from the HIGHMEM zone as well. >>>>> Thanks for your clarify, >>>>> http://lwn.net/Articles/537422/, your article about zswap in lwn. >>>>> "Additionally, the kernel slab allocator does not allow >>>>> objects that >>>>> are less >>>>> than a page in size to span a page boundary. This means that if an >>>>> object is >>>>> PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page, >>>>> resulting in >>>>> ~50% waste. Hense there are *no kmalloc() cache size* between >>>>> PAGE_SIZE/2 and >>>>> PAGE_SIZE." >>>>> Are your sure? It seems that kmalloc cache support big size, your >>>>> can >>>>> check in >>>>> include/linux/kmalloc_sizes.h >>>> Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no >>>> cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE. For example, on a >>>> system with 4k pages, there are no caches between kmalloc-2048 and >>>> kmalloc-4096. >>> kmalloc object > PAGE_SIZE/2 or > PAGE_SIZE should also allocate from >>> slab cache, correct? Then how can alloc object w/o slab cache which? >>> contains this object size objects? >> I have to admit, I didn't understand the question. > > object is allocated from slab cache, correct? There two kinds of slab > cache, one is for general purpose, eg. kmalloc slab cache, the other > is for special purpose, eg. mm_struct, task_struct. kmalloc object > > PAGE_SIZE/2 or > PAGE_SIZE should also allocated from slab cache, > correct? then why you said that there are no caches between > kmalloc-2048 and kmalloc-4096? Ok, now I get it. Yes, I guess I should qualified here that there are no _kmalloc_ caches between PAGE_SIZE/2 and PAGE_SIZE. Yes, one can create caches of a particular size. However that doesn't work well for zswap because the compressed pages vary widely and size and, imo, it doesn't make sense to create a bunch of caches very granular in size. Plus having granular caches doesn't solve the fragmentation issue caused by the storage of large objects. Thanks, Seth -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>