On Thu, May 07, 2009 at 11:36:42AM -0400, Vivek Goyal wrote: > Hmm.., my old config had "AS" as default scheduler that's why I was seeing > the strange issue of RT task finishing after BE. My apologies for that. I > somehow assumed that CFQ is default scheduler in my config. ok. > > So I have re-run the test to see if we are still seeing the issue of > loosing priority and class with-in cgroup. And we still do.. > > 2.6.30-rc4 with io-throttle patches > =================================== > Test1 > ===== > - Two readers, one BE prio 0 and other BE prio 7 in a cgroup limited with > 8MB/s BW. > > 234179072 bytes (234 MB) copied, 55.8448 s, 4.2 MB/s > prio 0 task finished > 234179072 bytes (234 MB) copied, 55.8878 s, 4.2 MB/s > > Test2 > ===== > - Two readers, one RT prio 0 and other BE prio 7 in a cgroup limited with > 8MB/s BW. > > 234179072 bytes (234 MB) copied, 55.8876 s, 4.2 MB/s > 234179072 bytes (234 MB) copied, 55.8984 s, 4.2 MB/s > RT task finished ok, coherent with the current io-throttle implementation. > > Test3 > ===== > - Reader Starvation > - I created a cgroup with BW limit of 64MB/s. First I just run the reader > alone and then I run reader along with 4 writers 4 times. > > Reader alone > 234179072 bytes (234 MB) copied, 3.71796 s, 63.0 MB/s > > Reader with 4 writers > --------------------- > First run > 234179072 bytes (234 MB) copied, 30.394 s, 7.7 MB/s > > Second run > 234179072 bytes (234 MB) copied, 26.9607 s, 8.7 MB/s > > Third run > 234179072 bytes (234 MB) copied, 37.3515 s, 6.3 MB/s > > Fourth run > 234179072 bytes (234 MB) copied, 36.817 s, 6.4 MB/s > > Note that out of 64MB/s limit of this cgroup, reader does not get even > 1/5 of the BW. In normal systems, readers are advantaged and reader gets > its job done much faster even in presence of multiple writers. And this is also coherent. The throttling is equally probable for read and write. But this shouldn't happen if we saturate the physical disk BW (doing proportional BW control or using a watermark close to 100 in io-throttle). In this case IO scheduler logic shouldn't be totally broken. Doing a very quick test with io-throttle, using a 10MB/s BW limit and blockio.watermark=90: Launching reader 256+0 records in 256+0 records out 268435456 bytes (268 MB) copied, 32.2798 s, 8.3 MB/s In the same time the writers wrote ~190MB, so the single reader got about 1/3 of the total BW. 182M testzerofile4 198M testzerofile1 188M testzerofile3 189M testzerofile2 Things are probably better with many cgroups, many readers and writers and in general the disk BW more saturated. Proportional BW approach wins in this case, because if you always use the whole disk BW the logic of the IO scheduler is still valid. > > Vanilla 2.6.30-rc4 > ================== > > Test3 > ===== > Reader alone > 234179072 bytes (234 MB) copied, 2.52195 s, 92.9 MB/s > > Reader with 4 writers > --------------------- > First run > 234179072 bytes (234 MB) copied, 4.39929 s, 53.2 MB/s > > Second run > 234179072 bytes (234 MB) copied, 4.55929 s, 51.4 MB/s > > Third run > 234179072 bytes (234 MB) copied, 4.79855 s, 48.8 MB/s > > Fourth run > 234179072 bytes (234 MB) copied, 4.5069 s, 52.0 MB/s > > Notice, that without any writers we seem to be having BW of 92MB/s and > more than 50% of that BW is still assigned to reader in presence of > writers. Compare this with io-throttle cgroup of 64MB/s where reader > struggles to get 10-15% of BW. > > So any 2nd level control will break the notion and assumptions of > underlying IO scheduler. We should probably do control at IO scheduler > level to make sure we don't run into such issues while getting > hierarchical fair share for groups. > > Thanks > Vivek > What are the results with your IO scheduler controller (if you already have them, otherwise I'll repeat this test in my system)? It seems a very interesting test to compare the advantages of the IO scheduler solution respect to the io-throttle approach. Thanks, -Andrea > > So now we are left with the issue of loosing the notion of priority and > > class with-in cgroup. In fact on bigger systems we will probably run into > issues of kiothrottled scalability as single thread is trying to cater to > > all the disks. > > > > If we do max bw control at IO scheduler level, then I think we should be able > > to control max bw while maintaining the notion of priority and class with-in > > cgroup. Also there are multiple pdflush threads and jens seems to be pushing > > flusher threads per bdi which will help us achieve greater scalability and > > don't have to replicate that infrastructure for kiothrottled also. > > > > Thanks > > Vivek -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel