> For a RAID set of 6+1 2TB drives each capable of 60-120MB/s that > is still pretty terrible speed (even if the performance seems > not too bad). > Yes, but do note the 3.2 kernel has issues with the queue thingy. max sectors and max hw sectors is set on 127. I've seen this on some machines with late 2.6 kernels and 3.0 and 3.1 too iirc. It seems fixed in 3.5. However, I had issues compiling the iscsitarget-dkms modules against the 3.5 kernel (from package manager) and haven't taken the time to build a newer version myself, so I haven't tested IET with 3.5. Also, since it's reading and writing at the same time now it's no longer (nearly due to fs overhead) purely sequential. >>> Iostat with IETD whilst writing shows say 110-120% read per >>> write, however, in this case we were also actually reading. >>> [ ... ] IETD is running in fileio mode (write-back), so it >>> buffers too. [ ... ] > That probably helps the MD get a bit of help with aligned > writes, or perhaps at that point the array had been resynced, > who knows... The results I've submitted now all have been taken whilst the array was healthy. >> Are you enabling emulate_write_cache=1 with your iblock >> backends..? This can have a gigantic effect on initiator >> performance for both MSFT + Linux SCSI clients. > That sounds interesting, but also potentially rather dangerous, > unless there is a very reliable implementation of IO barriers. > Just like with enabling write caches on real disks... > >> [ ... ] check your [ ... ]/queue/max*sectors_kb for the MD >> RAID to make sure the WRITEs are striped aligned to get best >> performance with software MD raid. > That does not quite ensure that the writes are stripe aligned, > but perhaps a larger stripe cache would help. Does this help? root@datavault:~# parted /dev/md4 GNU Parted 2.3 Using /dev/md4 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) u b (parted) pr Model: Linux Software RAID Array (md) Disk /dev/md4: 12002393063424B Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 1966080B 10115507159039B 10115505192960B ReplayStorage 2 10115507159040B 12002393046527B 1886885887488B VDR-Storage (parted) sel /dev/md4p1 Using /dev/md4p1 (parted) pr Model: Unknown (unknown) Disk /dev/md4p1: 10115505192960B Sector size (logical/physical): 512B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 17408B 134235135B 134217728B Microsoft reserved partition msftres 2 135266304B 10115504668671B 10115369402368B ntfs Basic data partition (parted) quit 1966080/(1024*64*6)=5 (not rounded) 135266304/(1024*64*6)=344 (not rounded) If my calculations are correct it shouldn't thus only be chunk but even stripe aligned. I did pay a lot of attention to this during setup. It's not my daily thing tho', so I do hope I did it correctly. 1024 to get from B to kiB, 64 kiB's per chunk, 6 data chunks in a 7 disk RAID-5 set (or well, originally a 8 disk RAID-6 but shouldn't differ). NTFS is formatted with 64kiB block/cluster size. I've just verified this again, in 3 ways :). > >> Please use FILEIO with this reporting emulate_write_cache=1 >> (WCE=1) to the SCSI clients. Note that by default in the last >> kernel releases we've change FILEIO backends to only always >> use O_SYNC to ensure data consistency during a hard power >> failure, regardless of the emulate_write_cache=1 setting. > Ahh interesting too. That's also the right choice unless there > is IO barrier support at all levels. This is too low level for me currently. I'll have to look it up. I also take from this that *emulating* write cache != write cache :). I've only conciously set the buffered mode, but as stated the targetcli utility, at least the version that comes with ubuntu 12.04, doesn't show this is set. Then again, not running in fileio mode either and the functionality has been disabled in 3.5 if I understood correctly. >> Also note that by default it's my understanding that IETD uses >> buffered FILEIO for performance, so in your particular type of >> setup you'd still see better performance with buffered FILEIO, >> but would still have the potential risk of silent data >> corruption with buffered FILEIO. > Not silent data corruption, but data loss. Silent data > corruption is usually meant for the case where an IO completes > and reports success, but the data recorded is not the data > submitted. Ok, then we have the same concepts. The loss might cause corruption obviously, but I've never seen it happen silently :). > >> [ ... ] understand the possible data integrity risks >> associated with using buffered FILEIO during a hard power >> failure, I'm fine with re-adding this back into >> target_core_file for v3.7 code for people who really know what >> they are doing. > That "people who really know what they are doing" is generally a > bit optimistic :-). I like to be free to choose. Might not always choose the smart thing - but at least it's been my choice, not some spoon fed thing :). Others like to be nurtured tho'. If you're concerned about users (or well admins - I've never seen a regular user set up a RAID + iSCSI target) safety that much though I'd take the middle ground - just throw a big fat red warning. Targetcli already uses fancy colors :). If people choose to ignore that it's *most definitely* their responsibility (not that it's anyone elses otherwise, the license clearly states no warranty whatsoever). There's other ways to make things safe tho' and sometimes speed is more important than integrity. There's probably still other reasons people might want to enable it. > > Do the various modes support IO barriers? That usually is what > is critical, at least for the better informed people. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html