On 06/07/2012 02:34 AM, Dan Magenheimer wrote: >> From: Minchan Kim [mailto:minchan@xxxxxxxxxx] > > Hi Minchan -- > > Reordering the reply a bit... > >>> On 06/05/2012 12:25 PM, Dan Magenheimer wrote: >>> Zsmalloc relies on some clever underlying virtual-to-physical >>> mapping manipulations to ensure that its users can store and >>> retrieve items. These manipulations are necessary on HIGHMEM >> >> HIGHMEM processors? >> I think we need it if the system doesn't support HIGHMEM. >> Maybe I am missing your point. > > I didn't say it very clearly. What I meant is that, on > processors that require HIGHMEM, it is always necessary > to do a kmap/kunmap around accessing the contents of a > pageframe referred to by a struct page. On machines > with no HIGHMEM, the kernel is completely mapped so > kmap/kunmap to kernel space are very simple and fast. > > However, whenever a compressed item crosses a page > boundary in zsmalloc, zsmalloc creates a special "pair" > mapping of the two pages, and kmap/kunmaps the pair for > every access. This is why special TLB tricks must > be used by zsmalloc. I think this can be expensive > so I consider this a disadvantage of zsmalloc, even > though it is very clever and very useful for storing > a large number of items with size larger than PAGE_SIZE/2. Fair. > >> What's the requirement for shrinking zsmalloc? >> For example, >> >> int shrink_zsmalloc_memory(int nr_pages) >> { >> zsmalloc_evict_pages(nr_pages); >> } >> >> Could you tell us your detailed requirement? >> Let's see it's possible or not at current zsmalloc. > > The objective of the shrinker is to reclaim full > pageframes. Due to the way zsmalloc works, when > it stores N items in M pages, worst case it > may take N-M zsmalloc "item evictions" before even > a single pageframe is reclaimed. Right. > > Next, remember that there may be several "pointers" > (stored as zsmalloc object handles) referencing that page > and there may also be a pointer to an item which > overlaps from an adjacent page. > In zcache, the pointers are stored in the tmem metadata. > This metadata must be purged from tmem before the > pageframe can be reclaimed. And this must be done > carefully, maybe atomically, because there are various > locks that must be held and released in the correct > order to avoid races and deadlock. (Holding one > big lock disallowing tmem from operating during reclaim > is an ugly alternative.) > > Next, ideally you'd like to be able to reclaim pageframes > in roughly LRU order. What does LRU mean when many > items stored in the pageframe (and possibly adjacent > pageframes) are added/deleted completely independently? > > Last, when that metadata is purged from tmem, for ephemeral > pages the actual stored data can be discarded. BUT when > the pages are persistent, the data cannot be discarded. > I have preliminary code that decompresses and pushes this > data back into the swapcache. This too must be atomic. I agree zsmalloc isn't good for you. Then, you can use your allocator "zbud". What's the problem? Do you want to replace zsmalloc with zbud in zram, too? > >>> RAMster maintains data structures to both point to zpages >>> that are local and remote. Remote pages are identified >>> by a handle-like bit sequence while local pages are identified >>> by a true pointer. (Note that ramster currently will not >>> run on a HIGHMEM machine.) RAMster currently differentiates >>> between the two via a hack: examining the LSB. If the >>> LSB is set, it is a handle referring to a remote page. >>> This works with xvmalloc and zbud but not with zsmalloc's >>> opaque handle. A simple solution would require zsmalloc >>> to reserve the LSB of the opaque handle as must-be-zero. >> >> As you know, it's not difficult but break opaque handle's concept. >> I want to avoid that and let you put some identifier into somewhere in zcache. > > That would be OK with me if it can be done without a large > increase in memory use. We have so far avoided adding > additional data to each tmem "pampd". Adding another > unsigned long worth of data is possible but would require > some bug internal API changes. > > There are many data structures in the kernel that take > advantage of unused low bits in a pointer, like what > ramster is doing. But this case is different. It's a generic library and even it's a HANDLE. I don't want to add such special feature to generic library's handle. > > And the opaqueness of the handle could still be preserved > if there are one or more reserved bits and one adds functions > to zsmalloc_set_reserved_bits(&handle) and > zsmalloc_read_reserved_bits(handle). > > But this is a nit until we are sure that zsmalloc will meet > the reclaim requirements. > >> At least, many embedded device have used zram since compcache was introduced. >> But not sure, zcache can replace it. >> If zcache can replace it, you will be right. >> >> Comparing zcache and zram implementation, it's one of my TODO list. >> So I am happy to see them. >> But I can't do it shorty due to other urgent works. > > Zcache has differences, the largest being that zcache currently > works only when the system has a configured swap block device. > Current zcache has issues too, but (as Andrea has observed) > they can be reduced by allowing zcache to be backed, when > necessary, by the swapdisk when memory pressure is high. > >> In summary, I WANT TO KNOW your detailed requirement for shrinking zsmalloc. > > My core requirement is that an implementation exists that can > handle pageframe reclaim efficiently and race-free. AND for > persistent pages, ensure it is possible to return the data > to the swapcache when the containing pageframe is reclaimed. > > I am not saying that zsmalloc *cannot* meet this requirement. > I just think it is already very difficult with a simple > non-opaque allocator such as zbud. That's why I am trying > to get it all working with zbud first. Agreed. Go ahead with zbud. Again, I can't understand your concern. :) Sorry if I miss your point. > > Hope that helps! > Dan > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: <a href=ilto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>