Feeling stupid to reply to my own email, hopefully nobody started to write a reply. Maybe someone else will find it useful. Problem comes down to latency. Second time this month that the same problem bites me :) If one disk latency is 100us when issuing small reads with direct io and iodepth of 1 the same disk has to wait for other to complete their reads. In a way this makes single disk latency x times the disk array. This is also the reason why full stripe reads are (almost) full speed. I/O gets submitted concurrently. I wonder why don't I see any improvement with readahead, direct i/o doesn't use it? Sorry if a wasted someones time, maybe someones will find it useful :) Dragan On Mon, Oct 12, 2015 at 10:59 PM, Dragan Milivojević <galileo@xxxxxxxxxxx> wrote: > Hi all > > I'm currently building a NAS/SAN server and I'm not quite sure why I'm getting > these results. > I would appreciate if someone could give me a theoretical explanation on why > this is happening. Raid arrays were built from 7 drives with 128k chunk. > I'm mostly concerned with read speed and small block direct i/o. > > Full raid and test details are attached and also uploaded (for easy viewing) > at: http://pastebin.com/VeGr0Ehc > > Summary of results from a few tests: > > == Hard disk, Sequential read, O_DIRECT > ---- > bs: 4k, bw: 80833KB/s > bs: 32k, bw: 152000KB/s > bs: 128k, bw: 152809KB/s > ---- > > == Hard disk, Sequential write, O_DIRECT > ---- > bs: 16k, bw: 144392KB/s > bs: 32k, bw: 145352KB/s > bs: 128k, bw: 145433KB/s > ---- > > == Raid 10 f2, chunk: 128k, Sequential read, O_DIRECT > ---- > bs: 4k, bw: 78136KB/s, per disk bw: 11MB/s, disk avgrq-sz: 8 > bs: 32k, bw: 289847KB/s, per disk bw: 41MB/s, disk avgrq-sz: 64 > bs: 128k, bw: 409674KB/s, per disk bw: 57MB/s, disk avgrq-sz: 256 > bs: 256k, bw: 739344KB/s, per disk bw: 104MB/s, disk avgrq-sz: 256 > bs: 896k, bw: 981517KB/s, per disk bw: 133MB/s, disk avgrq-sz: 256 > ---- > > == Raid 6, chunk: 128k, Sequential read, O_DIRECT > ---- > bs: 4k, bw: 46376KB/s, per disk bw: 6MB/s, disk avgrq-sz: 8 > bs: 32k, bw: 155531KB/s, per disk bw: 22MB/s, disk avgrq-sz: 64 > bs: 128k, bw: 182024KB/s, per disk bw: 26MB/s, disk avgrq-sz: 256 > bs: 256k, bw: 248591KB/s, per disk bw: 34MB/s, disk avgrq-sz: 256 > bs: 384k, bw: 299771KB/s, per disk bw: 40MB/s, disk avgrq-sz: 256 > bs: 512k, bw: 315374KB/s, per disk bw: 44MB/s, disk avgrq-sz: 256 > bs: 640k, bw: 296350KB/s, per disk bw: 44MB/s, disk avgrq-sz: 256 > bs: 1280k, bw: 543655KB/s, per disk bw: 75MB/s, disk avgrq-sz: 426 > bs: 2560k, bw: 618092KB/s, per disk bw: 83MB/s, disk avgrq-sz: 638 > ---- > > == Raid 6, chunk: 128k, Sequential read, Buffered > ---- > bs: 2560k, bw: 690546KB/s, per disk bw: 96MB/s, disk avgrq-sz: 512 > ---- > > == Raid 6, chunk: 128k, Sequential write, O_DIRECT, stripe_cache_size: 32768 > ---- > bs: 640K, bw: 382778KB/s, per disk bw: 75MB/s, disk avgrq-sz: 256 > bs: 1280K, bw: 405909KB/s, per disk bw: 82MB/s, disk avgrq-sz: 512 > ---- > > == Raid 6, chunk: 128k, Sequential write, Buffered, stripe_cache_size: 32768 > ---- > bs: 1280k, bw: 730527KB/s, per disk bw: 135M/s, disk avgrq-sz: 1024 > > > As can be seen from single disk tests, hard drives are capable of full speed > with 32k block size. What baffles me (especially with raid 10) is why do > I get such low speed with small request sizes? > > Even with full stripe reads performance is not at maximum. This is most > obvious with raid6. If I get the theory correctly maximum read speed per disk > in this configuration should be around 105Mb/s (0.7*150). > > With buffered reads I'm close to this figure so theory matches the real > performance with buffered reads. > > When thinking about writes one can easily grasp the theory behind full stripe > writes and why block writes less than full stripe incur a performance hit. > > I don really get why I'm getting a performance hit with reads. I get it that > for every 7 chunks written to a single disk 2 of those will be parity. So when > reading the disk I'm expecting a 30% hit? > > Raid 10 (f2) results confuse me even more. If I understand it correctly parity > is written at the end of disk so (in theory) there should be no performance > hit with raid 10? > > I'm also seeing similar performance figures with real world usage (windoze > clients with iscsi and mpio). Per disk disk bw figures remind me more of > random then sequential i/o. > > The only theory that I came up with revolves around read request merges and > disk firmware. I'm seeing read request merges with fairly large block sizes > and even greater with buffered io. I'm not that familiar with kernel block > internals so I'm seeking an authoritative answer. > > Can someone point our some docs that I can read or offer an explanation? > > Thanks > Dragan -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html