> From: Avi Kivity [mailto:avi@xxxxxxxxxx] > Sent: Wednesday, November 02, 2011 9:45 AM > To: James Bottomley > Cc: Andrea Arcangeli; Dan Magenheimer; Pekka Enberg; Cyclonus J; Sasha Levin; Christoph Hellwig; David > Rientjes; Linus Torvalds; linux-mm@xxxxxxxxx; LKML; Andrew Morton; Konrad Wilk; Jeremy Fitzhardinge; > Seth Jennings; ngupta@xxxxxxxxxx; Chris Mason; JBeulich@xxxxxxxxxx; Dave Hansen; Jonathan Corbet > Subject: Re: [GIT PULL] mm: frontswap (for 3.2 window) > > On 11/01/2011 12:16 PM, James Bottomley wrote: > > Actually, I think there's an unexpressed fifth requirement: > > > > 5. The optimised use case should be for non-paging situations. > > > > The problem here is that almost every data centre person tries very hard > > to make sure their systems never tip into the swap zone. A lot of > > hosting datacentres use tons of cgroup controllers for this and > > deliberately never configure swap which makes transcendent memory > > useless to them under the current API. I'm not sure this is fixable, > > but it's the reason why a large swathe of users would never be > > interested in the patches, because they by design never operate in the > > region transcended memory is currently looking to address. > > > > This isn't an inherent design flaw, but it does ask the question "is > > your design scope too narrow?" > > If you look at cleancache, then it addresses this concern - it extends > pagecache through host memory. When dropping a page from the tail of > the LRU it first goes into tmem, and when reading in a page from disk > you first try to read it from tmem. However in many workloads, > cleancache is actually detrimental. If you have a lot of cache misses, > then every one of them causes a pointless vmexit; considering that > servers today can chew hundreds of megabytes per second, this adds up. > On the other side, if you have a use-once workload, then every page that > falls of the tail of the LRU causes a vmexit and a pointless page copy. I agree with everything you've said except "_many_ workloads". I would characterize this as "some" workloads, and increasingly fewer machines... because core-counts are increasing faster than the ability to attach RAM to them (according to published research). I did code a horrible hack to fix this, but haven't gotten back to RFC'ing it to see if there were better, less horrible, ideas. It essentially only puts into tmem pages that are being reclaimed but previously had the PageActive bit set... a smaller but higher-hit-ratio source of pages, I think. Anyway, I've been very open about this (see https://lkml.org/lkml/2011/8/29/225 , but it affects cleancache. Frontswap ONLY deals with pages that would have otherwise been swapin/swapout to a physical swap device. Dan -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href