On Sat, Mar 24, 2012 at 08:12:53PM +0000, Chris Wilson wrote: > Originally the code tried to allocate a large enough array to perform > the copy using vmalloc, performance wasn't great and throughput was > improved by processing each individual relocation entry separately. > This too is not as efficient as one would desire. A compromise would be > to allocate a single page, or to allocate a few entries on the stack, > and process the copy in batches. The latter gives simpler code and more > consistent performance due to a lack of heuristic. > > x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) > before: 249000 785000 1280000 (80%) > page: 264000 896000 1280000 (65%) > on-stack: 264000 902000 1280000 (67%) > > v2: Use 512-bytes of stack for batching rather than allocate a page. > v3: Tidy the code slightly with more descriptive variable names > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch> ... and also queued up for -next, thanks for the patch. -Daniel -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48