On Sun, Dec 01, 2019 at 11:00:32AM +0200, Alex Lyakas wrote: > Hi Dave, > > Thank you for your response. > > On Sat, Nov 30, 2019 at 10:28 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Wed, Nov 27, 2019 at 06:19:29AM -0800, Christoph Hellwig wrote: > > > Can we all take a little step back and think about the implications > > > of the original patch from Alex? Because I think there is very little. > > > And updated sunit/swidth is just a little performance optimization, > > > and anyone who really cares about changing that after the fact can > > > trivially add those to fstab. > > > > > > So I think something like his original patch plus a message during > > > mount that the new values are not persisted should be perfectly fine. > > > > Well, the original purpose of the mount options was to persist a new > > sunit/swidth to the superblock... > > > > Let's ignore the fact that it was a result of a CXFS client mount > > bug trashing the existing sunit/swidth values, and instead focus on > > the fact we've been telling people for years that you "only need to > > set these once after a RAID reshape" and so we have a lot of users > > out there expecting it to persist the new values... > > > > I don't think we can just redefine the documented and expected > > behaviour of a mount option like this. > > > > With that in mind, the xfs(5) man page explicitly states this: > > > > The sunit and swidth parameters specified must be compatible > > with the existing filesystem alignment characteristics. In > > general, that means the only valid changes to sunit are > > increasing it by a power-of-2 multiple. Valid swidth values > > are any integer multiple of a valid sunit value. > > > > Note the comment about changes to sunit? What is being done here - > > halving the sunit from 64 to 32 blocks is invalid, documented as > > invalid, but the kernel does not enforce this. We should fix the > > kernel code to enforce the alignment rules that the mount option > > is documented to require. > > > > If we want to change the alignment characteristics after mkfs, then > > use su=1,sw=1 as the initial values, then the first mount can use > > the options to change it to whatever is present after mkfs has run. > > If I understand your response correctly: > - some sunit/swidth changes during mount are legal and some aren't > - the legal changes should be persisted in the superblock Yup. > What about the repair? Even if user performs a legal change, it still > breaks the repairability of the file system. It is not a legal ichange if it moves the root inode to a new location. IOWs, if the alignment mods will result in the root inode changing location, then it should be rejected by the kernel. Anyway, we need more details about your test environment, because the example you gave: | # mkfs | mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d sunit=64,swidth=64 -l sunit=32 /dev/vda | | #mount with a different sunit/swidth: | mount -onoatime,sync,nouuid,sunit=32,swidth=32 /dev/vda /mnt/xfs | | #umount | umount /mnt/xfs | | #xfs_repair | xfs_repair -n /dev/vda | # reports false corruption and eventually segfaults[1] Does not reproduce the reported failure on my workstation running v5.3.0 kernel and a v5.0.0 xfsprogs: $ sudo mkfs.xfs -f -d file,name=1t.img,size=1t,sunit=64,swidth=64 -l sunit=32 meta-data=1t.img isize=512 agcount=32, agsize=8388608 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=268435456, imaxpct=5 = sunit=8 swidth=8 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=131072, version=2 = sectsz=512 sunit=4 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ sudo mount -o loop -o sunit=32,swidth=32 1t.img /mnt/1t $ sudo xfs_info /mnt/1t .... data = bsize=4096 blocks=268435456, imaxpct=5 = sunit=4 swidth=4 blks .... $ sudo umount /mnt/1t $ sudo xfs_repair -f 1t.img Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - 08:42:14: scanning filesystem freespace - 32 of 32 allocation groups done - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... .... Phase 7 - verify and correct link counts... - 08:42:18: verify and correct link counts - 32 of 32 allocation groups done done $ echo $? 0 $ sudo mount -o loop 1t.img /mnt/1t $ sudo xfs_info /mnt/1t .... data = bsize=4096 blocks=268435456, imaxpct=5 = sunit=4 swidth=4 blks .... $ So reducing the sunit doesn't necessarily change the root inode location, and so in some cases reducing the sunit doesn't change the root inode location, either. > For now, we made a local change to not persist sunit/swidth updates in > the superblock. Because we must have a working repair, and our kernel > (4.14 stable) allows any sunit/swidth changes. >From the above, it's not clear where the problem lies - it may be that there's a bug in repair we've fixed since whatever version you are using.... > We can definitely adhere to the recommended behavior of setting > sunit/swidth=1 during mkfs, provided the repair still works after > mounting with different sunit/swidth. ... hence I'd suggest that more investigation needs to be done before you do anything permanent... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx