On Thu, Apr 11, 2013 at 08:06:10PM -0400, Mikulas Patocka wrote: > All that I can tell you is that adding an empty atomic operation > "cmpxchg(&bio->bi_css->refcnt, bio->bi_css->refcnt, bio->bi_css->refcnt);" > to bio_clone_context and bio_disassociate_task increases the time to run a > benchmark from 23 to 40 seconds. Right, linear target on ramdisk, very realistic, and you know what, hell with dm, let's just hand code everything into submit_bio(). I'm sure it will speed up your test case significantly. If this actually matters, improve it in *sane* way. Make the refcnts per-cpu and not use atomic ops. In fact, we already have proposed implementation of percpu refcnt which is being used by aio restructure patches and likely to be included in some form. It's not quite ready yet, so please work on something useful like that instead of continuing this non-sense. -- tejun -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel