Here finally is the long promised rewrite of zcache (and ramster). I know that we are concentrating on moving zcache from staging, and not ramster. However the amount of duplicate code that ramster used from zcache is astonishing so when I did the rewrite I thought why not kill two birds with one stone - since both are in the staging directory. Of notable interest to the broader mm community, I am proposing, when review is complete, to place zcache in a new subdirectory of mm, called "tmem" (short for transcendent memory). Zcache is truly memory management, not a hardware driver, and it interfaces with mm/swap/vfs through mm/cleancache.c and mm/frontswap.c (which possibly should move to the new tmem directory in the future as well). This is a major rewrite for zcache, not a sequence of small patches. So those who are interested in understanding, reviewing, and commenting in detail on the design and the functioning of the code can find it at: git://oss.oracle.com/git/djm/tmem.git #zcache-120731 For those who prefer to review and comment line-by-line, it's not clear yet how best to post the ~10K lines of code to ensure reviewer productivity. Konrad suggested an IRC talk on Monday to talk about this so we can figure out what is the proper option. (If you are not familiar with the tmem terminology, you can review it here: http://lwn.net/Articles/454795/ ) Some of the highlights of this git branch: 1. Merge of zcache and ramster. Zcache and ramster had a great deal of duplicate code which is now merged. In essence, zcache *is* ramster but with no remote machine available, but !CONFIG_RAMSTER will avoid compiling lots of ramster-specific code. 2. Allocator. Previously, persistent pools used zsmalloc and ephemeral pools used zbud. Now a completely rewritten zbud is used for both. Notably this zbud maintains all persistent (frontswap) and ephemeral (cleancache) pageframes in separate queues in LRU order. 3. Interaction with page allocator. Zbud does no page allocation/freeing, it is done entirely in zcache where it can be tracked more effectively. 4. Better pre-allocation. Previously, on put, if a new pageframe could not be pre-allocated, the put would fail, even if the allocator had plenty of partial pages where the data could be stored; this is now fixed. 5. Ouroboros ("eating its own tail") allocation. If no pageframe can be allocated AND no partial pages are available, the least-recently-used ephemeral pageframe is reclaimed immediately (including flushing tmem pointers to it) and re-used. This ensures that most-recently-used cleancache pages are more likely to be retained than LRU pages and also that, as in the core mm subsystem, anonymous pages have a higher priority than clean page cache pages. 6. Zcache and zbud now use debugfs instead of sysfs. Ramster uses debugfs where possible and sysfs where necessary. (Some ramster configuration is done from userspace so some sysfs is necessary.) 7. Modularization. As some have observed, the monolithic zcache-main.c code included zbud code, which has now been separated into its own code module. Much ramster-specific code in the old ramster zcache-main.c has also been moved into ramster.c so that it does not get compiled with !CONFIG_RAMSTER. 8. Rebased to 3.5. Konrad has been suggesting to prepare to "lift" the 2) "Allocator" out as a separate patch so that it could be used in the zcache1 as part of its promotion out of staging - if we think that zcache1 needs that. The problem with that is that the code has been tested with all the other code together. It is unclear whether by itself - without the rest of the harness - it would work properly. And if the time spent finding those bugs (of the lifted code) will be greater than just dropping in zcache2 as zcache1 and concentrate on promoting that. The nice-to-have-features that I had in the back of my mind (so after zcache and ramster have left staging) were: A. Ouroboros writeback. Since persistent (frontswap) pages may now also be reclaimed in LRU order, the foundation is in place to properly writeback these pages back into the swap cache and then the swap disk. This is still under development and requires some other mm changes which are prototyped but not yet included with this patch. B. WasActive patch, requires some mm/frontswap changes previously posted (but still has a known problem or two). C. Module capability, see patch posted by Erlangen University. Needs to be brought up to kernel standards. If anybody is interested on helping out with these, let me know! P.S. I've just started tracking down a memory leak, so I don't recommend benchmarking this zcache-120731 version yet. Signed-off-by: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> diffstat vs 3.5: drivers/staging/ramster/Kconfig | 2 drivers/staging/ramster/Makefile | 2 drivers/staging/zcache/Kconfig | 2 drivers/staging/zcache/Makefile | 2 mm/Kconfig | 2 mm/Makefile | 4 mm/tmem/Kconfig | 33 mm/tmem/Makefile | 5 mm/tmem/tmem.c | 894 +++++++++++++ mm/tmem/tmem.h | 259 +++ mm/tmem/zbud.c | 1060 +++++++++++++++ mm/tmem/zbud.h | 33 mm/tmem/zcache-main.c | 1686 +++++++++++++++++++++++++ mm/tmem/zcache.h | 53 mm/tmem/ramster.h | 59 mm/tmem/ramster/heartbeat.c | 462 ++++++ mm/tmem/ramster/heartbeat.h | 87 + mm/tmem/ramster/masklog.c | 155 ++ mm/tmem/ramster/masklog.h | 220 +++ mm/tmem/ramster/nodemanager.c | 995 +++++++++++++++ mm/tmem/ramster/nodemanager.h | 88 + mm/tmem/ramster/r2net.c | 414 ++++++ mm/tmem/ramster/ramster.c | 985 ++++++++++++++ mm/tmem/ramster/ramster.h | 161 ++ mm/tmem/ramster/ramster_nodemanager.h | 39 mm/tmem/ramster/tcp.c | 2253 ++++++++++++++++++++++++++++++++++ mm/tmem/ramster/tcp.h | 159 ++ mm/tmem/ramster/tcp_internal.h | 248 +++ 28 files changed, 10358 insertions(+), 4 deletions(-) _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/devel