Hi Dave, On Sun, Dec 1, 2019 at 11:57 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, Dec 01, 2019 at 11:00:32AM +0200, Alex Lyakas wrote: > > Hi Dave, > > > > Thank you for your response. > > > > On Sat, Nov 30, 2019 at 10:28 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > On Wed, Nov 27, 2019 at 06:19:29AM -0800, Christoph Hellwig wrote: > > > > Can we all take a little step back and think about the implications > > > > of the original patch from Alex? Because I think there is very little. > > > > And updated sunit/swidth is just a little performance optimization, > > > > and anyone who really cares about changing that after the fact can > > > > trivially add those to fstab. > > > > > > > > So I think something like his original patch plus a message during > > > > mount that the new values are not persisted should be perfectly fine. > > > > > > Well, the original purpose of the mount options was to persist a new > > > sunit/swidth to the superblock... > > > > > > Let's ignore the fact that it was a result of a CXFS client mount > > > bug trashing the existing sunit/swidth values, and instead focus on > > > the fact we've been telling people for years that you "only need to > > > set these once after a RAID reshape" and so we have a lot of users > > > out there expecting it to persist the new values... > > > > > > I don't think we can just redefine the documented and expected > > > behaviour of a mount option like this. > > > > > > With that in mind, the xfs(5) man page explicitly states this: > > > > > > The sunit and swidth parameters specified must be compatible > > > with the existing filesystem alignment characteristics. In > > > general, that means the only valid changes to sunit are > > > increasing it by a power-of-2 multiple. Valid swidth values > > > are any integer multiple of a valid sunit value. > > > > > > Note the comment about changes to sunit? What is being done here - > > > halving the sunit from 64 to 32 blocks is invalid, documented as > > > invalid, but the kernel does not enforce this. We should fix the > > > kernel code to enforce the alignment rules that the mount option > > > is documented to require. > > > > > > If we want to change the alignment characteristics after mkfs, then > > > use su=1,sw=1 as the initial values, then the first mount can use > > > the options to change it to whatever is present after mkfs has run. > > > > If I understand your response correctly: > > - some sunit/swidth changes during mount are legal and some aren't > > - the legal changes should be persisted in the superblock > > Yup. > > > What about the repair? Even if user performs a legal change, it still > > breaks the repairability of the file system. > > It is not a legal ichange if it moves the root inode to a new > location. IOWs, if the alignment mods will result in the root inode > changing location, then it should be rejected by the kernel. > > Anyway, we need more details about your test environment, because > the example you gave: > > | # mkfs > | mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d sunit=64,swidth=64 -l sunit=32 /dev/vda > | > | #mount with a different sunit/swidth: > | mount -onoatime,sync,nouuid,sunit=32,swidth=32 /dev/vda /mnt/xfs > | > | #umount > | umount /mnt/xfs > | > | #xfs_repair > | xfs_repair -n /dev/vda > | # reports false corruption and eventually segfaults[1] > > Does not reproduce the reported failure on my workstation running > v5.3.0 kernel and a v5.0.0 xfsprogs: > > $ sudo mkfs.xfs -f -d file,name=1t.img,size=1t,sunit=64,swidth=64 -l sunit=32 > meta-data=1t.img isize=512 agcount=32, agsize=8388608 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=0 > data = bsize=4096 blocks=268435456, imaxpct=5 > = sunit=8 swidth=8 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=131072, version=2 > = sectsz=512 sunit=4 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 I believe in one of earlier emails in this thread, Brian mentioned that this is related to "sparse inodes" being enabled: "I couldn't reproduce this at first because sparse inodes is enabled by default and that introduces more strict inode alignment requirements." Can you please try mkfs'ing with spare inodes disabled? Thanks, Alex. > $ sudo mount -o loop -o sunit=32,swidth=32 1t.img /mnt/1t > $ sudo xfs_info /mnt/1t > .... > data = bsize=4096 blocks=268435456, imaxpct=5 > = sunit=4 swidth=4 blks > .... > $ sudo umount /mnt/1t > $ sudo xfs_repair -f 1t.img > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - 08:42:14: scanning filesystem freespace - 32 of 32 allocation groups done > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > .... > Phase 7 - verify and correct link counts... > - 08:42:18: verify and correct link counts - 32 of 32 allocation groups done > done > $ echo $? > 0 > $ sudo mount -o loop 1t.img /mnt/1t > $ sudo xfs_info /mnt/1t > .... > data = bsize=4096 blocks=268435456, imaxpct=5 > = sunit=4 swidth=4 blks > .... > $ > > So reducing the sunit doesn't necessarily change the root inode > location, and so in some cases reducing the sunit doesn't change > the root inode location, either. > > > For now, we made a local change to not persist sunit/swidth updates in > > the superblock. Because we must have a working repair, and our kernel > > (4.14 stable) allows any sunit/swidth changes. > > From the above, it's not clear where the problem lies - it may be > that there's a bug in repair we've fixed since whatever version you > are using.... > > > We can definitely adhere to the recommended behavior of setting > > sunit/swidth=1 during mkfs, provided the repair still works after > > mounting with different sunit/swidth. > > ... hence I'd suggest that more investigation needs to be done > before you do anything permanent... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx