Dne 30.12.2011 22:04, Michele Codutti napsal(a): > Hi all, thanks for the tips I'll reply everyone in one aggregated message: >> Just a thought, but do you have the "XP mode" jumper removed on all drives? > Yes. > >> Instead of doing a monster sequential write to find my disk speed, I >> generally find it more useful to add conv=fdatasync to a dd so that >> the dirty buffers are utilized as they are in most real-world working >> environments, but I don't get a result until the test is on-disk. > Done, same results (40 MB/s) > >>>> My only suggestion would be to experiment with various partitioning, >>> >>> >>> Poster already said they're not partitioned. >> >> Correct. using partitioning allows you to adjust the alignment, so for >> example if the MD superblock at the front moves the start of the >> exported MD device out of alignment with the base disks, you could >> compensate for it by starting your partition on the correct offset. > Done. I've created one big partition using parted with "-a optimal". > The partition layout is (fdisk friendly output): > Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes > 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors > Units = sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > Disk identifier: 0x00077f06 > > Device Boot Start End Blocks Id System > /dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect > Redone the test with the "conv=fdatasync" option as above: same results. > >> My only suggestion would be to experiment with various partitioning, >> starting the first partition at 2048s or various points to see if you >> can find a placement that aligns the partitions properly. I'm sure >> there's an explanation, but I'm not in the mood to put on my thinking >> hat to figure it out at the moment. May also be worth using a >> different superblock version, as 1.2 is 4k from the start of the >> drives, which might be messing with alignment (although I would expect >> it on all arrays), worth trying the .9 which goes to the end of the >> device. > I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results. > >> No, those drives generally DON'T report 4k to the OS, even though they >> are. If they were, there'd be fewer problems. They lie and say 512b >> sectors for compatibility. > Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones. > >> My recommendation would be to look into the stripe-cache settings and check >> iostat -x 5 output. What is most likely happening is that when writing to >> the raid5, it's reading some (to calculate parity most likely) and not just >> writing. iostat will confirm if this is indeed the case. > Could you explain how I could look into the stripe-cache settings? > This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase: > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 13.29 0.00 0.00 86.71 > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 6585.60 0.00 4439.20 0.00 44099.20 0.00 19.87 6.14 1.38 1.38 0.00 0.09 39.28 > sdb 6280.40 0.00 4746.60 0.00 44108.00 0.00 18.59 5.20 1.10 1.10 0.00 0.07 35.04 > sdc 0.00 9895.40 0.00 1120.80 0.00 44152.80 78.79 12.03 10.73 0.00 10.73 0.82 92.32 > I also build a RAID6 (with one drive missing): same results. > >> There must be some misalignment somewhere :( > Yes, it's the same behavior. > >> Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under >> /sys/block/sdX/queue/ ?? > No they lie about the block size as you can see also in the fdisk output above. > >> NB: how does it perform with partitions starting at sector 2048 (check >> all disks with fdisk -lu /dev/sdX). > They perform the same. > > Any other suggestion? > > I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!-- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > WTF is the jumper for then ? (on 512B drive) Does it change somehow: /sys/block/sdX/queue/physical_block_size /sys/block/sdX/queue/logical_block_size /sys/block/sdX/alignment_offset If osol can handle it (enforcing 4k), it's good sign.. (you used ashift=12 for the pool, right?) Z. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html