I'm pretty certain now that the problem is me not performing the requests in a sync manner. I wrote a mock code, in which on receipt I insert all r/w requests to a queue. A secondary thread simply polls in small intervals (50us) on the queue, dequeues all pneding requests and writes data to RAM. This is the most basic async setup possible, and it still doesn't pass my file system benchmark successfully. I should note that when simply performing the write to RAM and returning (without the extra queuing phase) the benchmark is successful. I see that there is bs_aio.c which does a very similar thing to what I do, but using polling on a pipe. I'm not familiar with this method, but it doesnt sound inherently different from what I do with threads and mutexes. So I wonder why the behavior changes so drastically between the two implementations. I'd appreciate your help. Thank you -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html