I'm sorry, I forgot to mention hardware details. It isn't aacraid, it is megaraid-based Dell PERC H700 w/ 1GB NVRAM and 12x 450GB 15k SAS drives in RAID-10. All in Dell R510 server. Thanks, Martin Dne 20.2.2013 21:48, Nicholas A. Bellinger napsal(a): > Hi Martin, > > CC'ing linux-scsi here, as aacraid doesn't have an official maintainer > atm. > > --nab > > On Wed, 2013-02-20 at 16:38 +0100, Martin Svec wrote: >> Hello, >> >> I've noticed read I/O starvation problems of LIO iSCSI target when >> used on top of writeback-enabled HW RAID controller (PERC H700 with >> 1GB cache). For intensive mixed read-write workload in virtualized >> environments, writes are able to consume over 95% of the IOPS >> throughput and cause starvation of reads. >> >> After a number of tests it seems to me it's a general issue of block >> layer I/O scheduling when running on top of a writeback device. If >> there is a write-intensive task, all writes go to the writeback cache >> with near-zero latency. This allows writer to quickly saturate the >> device with thousands of writes while using only a minimal fraction of >> queue depth. However, non-cached reads depend on spinning drive >> latencies which are orders of magnitude higher than writeback cache >> latencies, and so readers cannot submit so many requests per second as >> writers. Consequently, I guess the controller has totally wrong view >> of the incoming workload pattern, tries to satisfy the write flood >> first and the net result is inacceptable starvation of reads, with >> latencies up to hundreds of milliseconds. >> >> A simple fio test with 1TiB block device where one thread does 4k >> random sync writes with iodepth=32 and one thread does 4k random reads >> with iodepth=32 shows that instead of the theoretical 50:50 IOPS >> ratio, the block device runs with 95:5 ratio in favor of writes. In >> fact, the imbalance is so high that even write iodepth=2 is enaugh to >> achieve the same numbers. >> >> Real workloads that tend to exhibit this problem are: initial zeroing >> of a virtual machine disk, virtual machine migration, virtual machine >> cloning, intensive swapping of one virtual machine etc. >> >> I tried to set WCE=1 on target iblock device, played with queue >> depths, tested all three I/O schedulers and their parameters, >> controller's parameters, but with no luck. To achieve reasonably good >> fairness, the only solution is to set nr_requests to 1 or disable >> controller's writeback cache at all -- at the expense of degraded >> overall performance :-( >> >> Regarding nr_requests, there's obvious relation between iodepths and >> read starvation: if (nr_requests >= workload iodepth) then starvation >> surely occurs. Lowering nr_requests below this threshold slowly starts >> improving fairness and for every rd+wr iodepths pair, there exists >> sufficiently low nr_requests value at which IOPS ratio is finally >> balanced according to rd:wr iodepth ratio. Unfortunately it means >> there is no minimal nr_requests value suitable for all workloads. For >> iodepths around 2 to 8, only nr_requests=1 provides fair load balancing. >> >> Is this a known problem? Does anybody find block layer parameters that >> elliminate this problem for iscsi-target storage in mixed random >> read-write environments like virtualization? Or should I start writing >> my own I/O scheduler? ;-) >> >> Update: I've just found https://lkml.org/lkml/2012/12/10/550 (Read >> starvation by sync writes), where Jan Kara describes identical >> symptoms. But setting nr_requests=10000 doesn't help in my case. >> CC'ing LKML too (I'm not LKML subscriber). >> >> Thanks, >> >> Martin >> >> -- >> To unsubscribe from this list: send the line "unsubscribe target-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html