Read starvation by sync writes

Jan Kara <jack@xxxxxxx> · Mon, 10 Dec 2012 23:12:22 +0100

  Hi,

  I was looking into IO starvation problems where streaming sync writes (in
my case from kjournald but DIO would look the same) starve reads. This is
because reads happen in small chunks and until a request completes we don't
start reading further (reader reads lots of small files) while writers have
plenty of big requests to submit. Both processes end up fighting for IO
requests and writer writes nr_batching 512 KB requests while reader reads
just one 4 KB request or so. Here the effect is magnified by the fact that
the drive has relatively big queue depth so it usually takes longer than
BLK_BATCH_TIME to complete the read request. The net result is it takes
close to two minutes to read files that can be read under a second without
writer load. Without the big drive's queue depth, results are not ideal but
they are bearable - it takes about 20 seconds to do the reading. And for
comparison, when writer and reader are not competing for IO requests (as it
happens when writes are submitted as async), it takes about 2 seconds to
complete reading.

Simple reproducer is:

echo 3 >/proc/sys/vm/drop_caches
dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
sleep 30
time cat /etc/* 2>&1 >/dev/null
killall dd
rm /tmp/f

  The question is how can we fix this? Two quick hacks that come to my mind
are remove timeout from the batching logic (is it that important?) or
further separate request allocation logic so that reads have their own
request pool. More systematic fix would be to change request allocation
logic to always allow at least a fixed number of requests per IOC. What do
people think about this?

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html