On Sat, Jun 23, 2012 at 02:50:49PM +0200, Ingo Jürgensmann wrote: > muaddib:~# cat /proc/mdstat > Personalities : [raid1] [raid6] [raid5] [raid4] > md7 : active raid5 sdf4[3] sdd4[1] sde4[0] > 7811261440 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] ..... > The RAID devices /dev/md0 to /dev/md4 are on my old 3x 1 TB > Seagate disks. Anyway, to finally come to the problem, when I try > to create a filesystem on the new RAID5 I get the following: > > muaddib:~# mkfs.xfs /dev/lv/usr > log stripe unit (524288 bytes) is too large (maximum is 256KiB) > log stripe unit adjusted to 32KiB > meta-data=/dev/lv/usr isize=256 agcount=16, agsize=327552 blks > = sectsz=512 attr=2, projid32bit=0 > data = bsize=4096 blocks=5240832, imaxpct=25 > = sunit=128 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=2560, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > As you can see I follow the "mkfs.xfs knows best, don't fiddle > around with options unless you know what you're doing!"-advice. > But apparently mkfs.xfs wanted to create a log stripe unit of 512 > kiB, most likely because it's the same chunk size as the > underlying RAID device. Exactly. Best thing in general is to align all log writes to the underlying stripe unit of the array. That way as multiple frequent log writes occur, it is guaranteed to form full stripe writes and basically have no RMW overhead. 32k is chosen by default because that's the default log buffer size and hence the typical size of log writes. If you increase the log stripe unit, you also increase the minimum log buffer size that the filesystem supports. The filesystem can support up to 256k log buffers, and hence the limit on maximum log stripe alignment. > The problem seems to be related to RAID5, because when I try to > make a filesystem on /dev/md6 (RAID1), there's no error message: They don't have a stripe unit/stripe width, so no alignment is needed or configured. > So, the question is: > - is this a bug somewhere in XFS, LVM or Linux's software RAID > implementation? Not a bug at all. > - will performance suffer from log stripe size adjusted to just 32 > kiB? Some of my logical volumes will just store data, but one or > the other will have some workload acting as storage for BackupPC. For data volumes, no. For backupPC, it depends on whether the MD RAID stripe cache can turn all the sequential log writes into a full stripe write. In general, this is not a problem, and is almost never a problem for HW RAID with BBWC.... > - would it be worth the effort to raise log stripe to at least 256 > kiB? Depends on your workload. If it is fsync heavy, I'd advise against it, as every log write will be padded out to 256k, even if you only write 500 bytes worth of transaction data.... > - or would it be better to run with external log on the old 1 TB > RAID? External logs provide muchless benefit with delayed logging than hey use to. As it is, your external log needs to have the same reliability characteristics as the main volume - lose the log, corrupt the filesystem. Hence for RAID5 volumes, you need a RAID1 log, and for RAID6 you either need RAID6 or a 3-way mirror to provide the same reliability.... > End note: the 4 TB disks are not yet "in production", so I can run > tests with both RAID setup as well as mkfs.xfs. Reshaping the RAID > will take up to 10 hours, though... IMO, RAID reshaping is just a bad idea - it changes the alignment characteristic of the volume, hence everything that the filesystemlaid down in an aligned fashion is now unaligned, and you have to tell the filesytemteh new alignment before new files will be correctly aligned. Also, it's usually faster to back up, recreate and restore than reshape and that puts a lot less load on your disks, too... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs