Hi Brian, On Mon, Nov 25, 2019 at 3:07 PM Brian Foster <bfoster@xxxxxxxxxx> wrote: > > On Sun, Nov 24, 2019 at 11:13:09AM +0200, Alex Lyakas wrote: > > Hi Brian, > > > > Thank you for your response. > > > > On Fri, Nov 22, 2019 at 5:43 PM Brian Foster <bfoster@xxxxxxxxxx> wrote: > > > > > > On Thu, Nov 21, 2019 at 08:08:19PM +0200, Alex Lyakas wrote: > > > > We are hitting the following issue: if XFS is mounted with sunit/swidth different from those > > > > specified during mkfs, then xfs_repair reports false corruption and eventually segfaults. > > > > > > > > Example: > > > > > > > > # mkfs > > > > mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d sunit=64,swidth=64 -l sunit=32 /dev/vda > > > > > > > > #mount with a different sunit/swidth: > > > > mount -onoatime,sync,nouuid,sunit=32,swidth=32 /dev/vda /mnt/xfs > > > > > > > > > > FYI, I couldn't reproduce this at first because sparse inodes is enabled > > > by default and that introduces more strict inode alignment requirements. > > > I'm assuming that sparse inodes is disabled in your example, but it > > > would be more helpful if you included the exact configuration and mkfs > > > output in such reports. > > Providing more details about configuration: > > > > kernel: 4.14.99 > > xfsprogs: 4.9.0+nmu1ubuntu2 > > content of /etc/zadara/xfs.protofile is captured in [1] > > mkfs output is captured in [2] > > > > > > > > > #umount > > > > umount /mnt/xfs > > > > > > > ... > > > > > > > > Looking at the kernel code of XFS, there seems to be no need to update the superblock sunit/swidth if the mount-provided sunit/swidth are different. > > > > The superblock values are not used during runtime. > > > > > > > > > > I'm not really sure what the right answer is here. On one hand, this > > > patch seems fundamentally reasonable to me. I find it kind of odd for > > > mount options to override and persist configuration set in the > > > superblock like this. OTOH, this changes a historical behavior which may > > > (or may not) cause disruption for users. I also think it's somewhat > > > unfortunate to change kernel mount option behavior to accommodate > > > repair, > > This is very critical for us to have a working repair in production. I > > presume the same is true for most users. > > > > Of course. > > > > but I think the mount option behavior being odd argument stands > > > on its own regardless. > > > > > > What is your actual use case for changing the stripe unit/width at mount > > > time like this? > > Our use case is like this: during mkfs the overall system does not > > know yet the exact sunit/swidth to be used. Also, the underlying > > storage can change its sunit/swidth alignment as part of some storage > > migration scenarios. During mount we already know the proper > > sunit/swidth. But the problem is that in order to specify sunit/swidth > > during mount, XFS superblock must be marked as "supporting data > > alignment", i.e., XFS_SB_VERSION_DALIGNBIT has to be set. Otherwise, > > XFS refuses to mount and says: > > > > xfs_warn(mp, > > "cannot change alignment: superblock does not support data alignment"); > > return -EINVAL; > > > > In order for the superblock to be marked like this, *some* > > sunit/swidth need to be specified during mkfs. > > > > What do you mean by "the overall system doesn't know the stripe unit > values" at mkfs time? Shouldn't these values originate from the > underlying storage? Is there some additional configuration at play in > this particular use case? I checked the code, and it turns out that the only reason for specifying sunit/swidth during mkfs is to mark the superblock as "supporting data alignment". The real sunit/swidth is always determined during mount, based on the underlying storage. The significant property of the storage is the deduplication granularity; we saw that we get best deduplication results when XFS sunit/swidth are aligned to deduplication granularity. During storage migrations, an XFS share can be migrated to a storage with different deduplication parameters; hence we need to be able to update the runtime sunit/swidth. Thanks, Alex. > > The migration use case certainly sounds valid, though my understanding > is that if an existing stripe configuration is in place, it's usually > best for the new storage configuration to take that into consideration > (and use some multiple of stripe unit or something, for example). > > > > > > > > With the suggested patch, xfs repair is working properly also when mount-provided sunit/swidth are different. > > > > > > > > However, I am not sure whether this is the proper approach. Otherwise, should we not allow specifying different sunit/swidth during mount? > > > > > > > ... > > > > > > > > Signed-off-by: Alex Lyakas <alex@xxxxxxxxxx> > > > > --- > > > > fs/xfs/xfs_mount.c | 18 ++++++------------ > > > > 1 file changed, 6 insertions(+), 12 deletions(-) > > > > > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > > > > index ba5b6f3..e8263b4 100644 > > > > --- a/fs/xfs/xfs_mount.c > > > > +++ b/fs/xfs/xfs_mount.c > > > > @@ -399,19 +399,13 @@ > > > > } > > > > > > > > /* > > > > - * Update superblock with new values > > > > - * and log changes > > > > + * If sunit/swidth specified during mount do not match > > > > + * those in the superblock, use the mount-specified values, > > > > + * but do not update the superblock. > > > > + * Otherwise, xfs_repair reports false corruption. > > > > + * Here, only verify that superblock supports data alignment. > > > > */ > > > > - if (xfs_sb_version_hasdalign(sbp)) { > > > > - if (sbp->sb_unit != mp->m_dalign) { > > > > - sbp->sb_unit = mp->m_dalign; > > > > - mp->m_update_sb = true; > > > > - } > > > > - if (sbp->sb_width != mp->m_swidth) { > > > > - sbp->sb_width = mp->m_swidth; > > > > - mp->m_update_sb = true; > > > > - } > > > > - } else { > > > > + if (!xfs_sb_version_hasdalign(sbp)) { > > > > > > Would this change xfs_info behavior on a filesystem mounted with > > > different runtime fields from the superblock? I haven't tested it, but > > > it looks like we pull the fields from the superblock. > > xfs_info uses XFS_IOC_FSGEOMETRY to get the values, and this pulls the > > values from the run-time copy of the superblock: > > > > int > > xfs_fs_geometry( > > xfs_mount_t *mp, > > xfs_fsop_geom_t *geo, > > int new_version) > > ... > > if (new_version >= 2) { > > geo->sunit = mp->m_sb.sb_unit; > > geo->swidth = mp->m_sb.sb_width; > > } > > > > So if during mount we have updated the superblock, we will pull the > > updated values. If we do not update (as the proposed patch), we will > > report the values stored in the superblock. Perhaps we need to update > > the geomtery ioctl to report runtime values? > > > > Ok, thanks. I'd argue that we should return the runtime results if we > took the proposed approach, but we should probably settle on a solution > first.. > > Brian > > > Thanks, > > Alex. > > > > > > > > Brian > > > > > > > xfs_warn(mp, > > > > "cannot change alignment: superblock does not support data alignment"); > > > > return -EINVAL; > > > > -- > > > > 1.9.1 > > > > > > > > > > > [1] > > root@vc-20-01-48-dev:~# cat /etc/zadara/xfs.protofile > > dummy : bootfilename, not used, backward compatibility > > 0 0 : numbers of blocks and inodes, not > > used, backward compatibility > > d--777 0 0 : set 777 perms for the root dir > > $ > > $ > > > > [2] > > root@vc-20-01-48-dev:~# mkfs.xfs -f -K -p /etc/zadara/xfs.protofile -d > > sunit=64,swidth=64 -l sunit=32 /dev/vda > > meta-data=/dev/vda isize=512 agcount=16, agsize=163832 blks > > = sectsz=512 attr=2, projid32bit=1 > > = crc=1 finobt=1, sparse=0, > > rmapbt=0, reflink=0 > > data = bsize=4096 blocks=2621312, imaxpct=25 > > = sunit=8 swidth=8 blks > > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > > log =internal log bsize=4096 blocks=2560, version=2 > > = sectsz=512 sunit=4 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > > >