Jan Kara <jack@xxxxxxx> writes: > Hi, > > I was looking into IO starvation problems where streaming sync writes (in > my case from kjournald but DIO would look the same) starve reads. This is > because reads happen in small chunks and until a request completes we don't > start reading further (reader reads lots of small files) while writers have > plenty of big requests to submit. Both processes end up fighting for IO > requests and writer writes nr_batching 512 KB requests while reader reads > just one 4 KB request or so. Here the effect is magnified by the fact that > the drive has relatively big queue depth so it usually takes longer than > BLK_BATCH_TIME to complete the read request. The net result is it takes > close to two minutes to read files that can be read under a second without > writer load. Without the big drive's queue depth, results are not ideal but > they are bearable - it takes about 20 seconds to do the reading. And for > comparison, when writer and reader are not competing for IO requests (as it > happens when writes are submitted as async), it takes about 2 seconds to > complete reading. > > Simple reproducer is: > > echo 3 >/proc/sys/vm/drop_caches > dd if=/dev/zero of=/tmp/f bs=1M count=10000 & > sleep 30 > time cat /etc/* 2>&1 >/dev/null > killall dd > rm /tmp/f This is a buffered writer. How does it end up that you are doing all synchronous write I/O? Also, you forgot to mention what file system you were using, and which I/O scheduler. Is this happening in some real workload? If so, can you share what that workload is? How about some blktrace data? > The question is how can we fix this? Two quick hacks that come to my mind > are remove timeout from the batching logic (is it that important?) or > further separate request allocation logic so that reads have their own > request pool. More systematic fix would be to change request allocation > logic to always allow at least a fixed number of requests per IOC. What do > people think about this? There has been talk of removing the limit on the number of requests allocated, but I haven't seen patches for it, and I certainly am not convinced of its practicality. Today, when using block cgroups you do get a request list per cgroup, so that's kind of the same thing as one per ioc. I can certainly see moving in that direction for the non-cgroup case. First, though, I'd like to better understand your workload. Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html