RE: zsmalloc concerns

Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> · Wed, 6 Jun 2012 10:34:40 -0700 (PDT)

> From: Minchan Kim [mailto:minchan@xxxxxxxxxx]

Hi Minchan --

Reordering the reply a bit...

> > On 06/05/2012 12:25 PM, Dan Magenheimer wrote:
> > Zsmalloc relies on some clever underlying virtual-to-physical
> > mapping manipulations to ensure that its users can store and
> > retrieve items.  These manipulations are necessary on HIGHMEM
> 
> HIGHMEM processors?
> I think we need it if the system doesn't support HIGHMEM.
> Maybe I am missing your point.

I didn't say it very clearly.  What I meant is that, on
processors that require HIGHMEM, it is always necessary
to do a kmap/kunmap around accessing the contents of a
pageframe referred to by a struct page.  On machines
with no HIGHMEM, the kernel is completely mapped so
kmap/kunmap to kernel space are very simple and fast.

However, whenever a compressed item crosses a page
boundary in zsmalloc, zsmalloc creates a special "pair"
mapping of the two pages, and kmap/kunmaps the pair for
every access.  This is why special TLB tricks must
be used by zsmalloc.  I think this can be expensive
so I consider this a disadvantage of zsmalloc, even
though it is very clever and very useful for storing
a large number of items with size larger than PAGE_SIZE/2.

> What's the requirement for shrinking zsmalloc?
> For example,
> 
> int shrink_zsmalloc_memory(int nr_pages)
> {
> 	zsmalloc_evict_pages(nr_pages);
> }
> 
> Could you tell us your detailed requirement?
> Let's see it's possible or not at current zsmalloc.

The objective of the shrinker is to reclaim full
pageframes.  Due to the way zsmalloc works, when
it stores N items in M pages, worst case it
may take N-M zsmalloc "item evictions" before even
a single pageframe is reclaimed.

Next, remember that there may be several "pointers"
(stored as zsmalloc object handles) referencing that page
and there may also be a pointer to an item which
overlaps from an adjacent page.
In zcache, the pointers are stored in the tmem metadata.
This metadata must be purged from tmem before the
pageframe can be reclaimed.  And this must be done
carefully, maybe atomically, because there are various
locks that must be held and released in the correct
order to avoid races and deadlock.  (Holding one
big lock disallowing tmem from operating during reclaim
is an ugly alternative.)

Next, ideally you'd like to be able to reclaim pageframes
in roughly LRU order.  What does LRU mean when many
items stored in the pageframe (and possibly adjacent
pageframes) are added/deleted completely independently?

Last, when that metadata is purged from tmem, for ephemeral
pages the actual stored data can be discarded.  BUT when
the pages are persistent, the data cannot be discarded.
I have preliminary code that decompresses and pushes this
data back into the swapcache.  This too must be atomic.

> > RAMster maintains data structures to both point to zpages
> > that are local and remote.  Remote pages are identified
> > by a handle-like bit sequence while local pages are identified
> > by a true pointer.  (Note that ramster currently will not
> > run on a HIGHMEM machine.)  RAMster currently differentiates
> > between the two via a hack: examining the LSB.  If the
> > LSB is set, it is a handle referring to a remote page.
> > This works with xvmalloc and zbud but not with zsmalloc's
> > opaque handle.  A simple solution would require zsmalloc
> > to reserve the LSB of the opaque handle as must-be-zero.
> 
> As you know, it's not difficult but break opaque handle's concept.
> I want to avoid that and let you put some identifier into somewhere in zcache.

That would be OK with me if it can be done without a large
increase in memory use.  We have so far avoided adding
additional data to each tmem "pampd".  Adding another
unsigned long worth of data is possible but would require
some bug internal API changes.

There are many data structures in the kernel that take
advantage of unused low bits in a pointer, like what
ramster is doing.

And the opaqueness of the handle could still be preserved
if there are one or more reserved bits and one adds functions
to zsmalloc_set_reserved_bits(&handle) and
zsmalloc_read_reserved_bits(handle).

But this is a nit until we are sure that zsmalloc will meet
the reclaim requirements.

> At least, many embedded device have used zram since compcache was introduced.
> But not sure, zcache can replace it.
> If zcache can replace it, you will be right.
> 
> Comparing zcache and zram implementation, it's one of my TODO list.
> So I am happy to see them.
> But I can't do it shorty due to other urgent works.

Zcache has differences, the largest being that zcache currently
works only when the system has a configured swap block device.
Current zcache has issues too, but (as Andrea has observed)
they can be reduced by allowing zcache to be backed, when
necessary, by the swapdisk when memory pressure is high.

> In summary, I WANT TO KNOW your detailed requirement for shrinking zsmalloc.

My core requirement is that an implementation exists that can
handle pageframe reclaim efficiently and race-free.  AND for
persistent pages, ensure it is possible to return the data
to the swapcache when the containing pageframe is reclaimed.

I am not saying that zsmalloc *cannot* meet this requirement.
I just think it is already very difficult with a simple
non-opaque allocator such as zbud.  That's why I am trying
to get it all working with zbud first.

Hope that helps!
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href