Re: [PATCH v2 0/3] staging: zcache: xcfmalloc support

Nitin Gupta <ngupta@xxxxxxxxxx> · Fri, 16 Sep 2011 13:36:54 -0400

On 09/15/2011 06:17 PM, Dave Hansen wrote:

> On Thu, 2011-09-15 at 14:24 -0500, Seth Jennings wrote:
>> How would you suggest that I measure xcfmalloc performance on a "very
>> large set of workloads".  I guess another form of that question is: How
>> did xvmalloc do this?
> 
> Well, it didn't have a competitor, so this probably wasn't done. :)
>

A lot of testing was done for xvmalloc (and its predecessor, tlsf)
before it was integrated into zram:

http://code.google.com/p/compcache/wiki/AllocatorsComparison
http://code.google.com/p/compcache/wiki/xvMalloc
http://code.google.com/p/compcache/wiki/xvMallocPerformance

I think we can use the same set of testing tools. See:
http://code.google.com/p/compcache/source/browse/#hg%2Fsub-projects%2Ftesting

These tools do issue mix of alloc and frees each with some probability
which can be adjusted in code.

There is also a tool called "swap replay" which collects swap-out traces
and simulates the same behavior in userspace, allowing allocator testing
with "real world" traces. See:
http://code.google.com/p/compcache/wiki/SwapReplay

> I'd like to see a microbenchmarky sort of thing.  Do a million (or 100
> million, whatever) allocations, and time it for both allocators doing
> the same thing.  You just need to do the *same* allocations for both.
> 
> It'd be interesting to see the shape of a graph if you did:
> 
> 	for (i = 0; i < BIG_NUMBER; i++) 
> 		for (j = MIN_ALLOC; j < MAX_ALLOC; j += BLOCK_SIZE) 
> 			alloc(j);
> 			free();
> 
> ... basically for both allocators.  Let's see how the graphs look.  You
> could do it a lot of different ways: alloc all, then free all, or alloc
> one free one, etc...  Maybe it will surprise us.  Maybe the page
> allocator overhead will dominate _everything_, and we won't even see the
> x*malloc() functions show up.
> 
> The other thing that's important is to think of cases like I described
> that would cause either allocator to do extra splits/joins or be slow in
> other ways.  I expect xcfmalloc() to be slowest when it is allocating
> and has to break down a reserve page.  Let's say it does a bunch of ~3kb
> allocations and has no pages on the freelists, it will:
> 
> 	1. scan each of the 64 freelists heads (512 bytes of cache)
> 	2. split a 4k page
> 	3. reinsert the 1k remainder
> 
> Next time, it will:
> 
> 	1. scan, and find the 1k bit
> 	2. continue scanning, eventually touching each freelist...
> 	3. split a 4k page
> 	4. reinsert the 2k remainder
> 
> It'll end up doing a scan/split/reinsert in 3/4 of the cases, I think.
> The case of the freelists being quite empty will also be quite common
> during times the pool is expanding.  I think xvmalloc() will have some
> of the same problems, but let's see if it does in practice.
>

Thanks,
Nitin
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/devel