From: Darrick J. Wong <djwong@xxxxxxxxxx> As part of the process of removing support for tiny filesystems (defined in the next patch to be anything under 300MB or 64M log size), we are trying to eliminate all the edge case regressions for small filesystems that the maintainer can find. Eric pointed out that the use case of formatting a 510M on a RAID device regresses once we start enforcing the 64M log size limit: # modprobe scsi_debug opt_blks=256 opt_xferlen_exp=6 dev_size_mb=510 # mkfs.xfs /dev/sdg Log size must be at least 64MB. <hapless user reads manpage, adjusts log size> # mkfs.xfs -l size=64m /dev/sdg internal log size 16384 too large, must be less than 16301 Because the device reports a stripe geometry, mkfs tries to create 8 AGs (instead of the usual 4) which are then very nearly 64M in size. The log itself cannot consume the entire AG, so its size is decreased, so its size is rounded down to allow the creation of AG headers and btrees, and then the log size is rounded down again to match the stripe unit. This results in a log that is less than 64MB in size, causing the format to fail. There's not much point in formatting tiny AGs on a small filesystem, even if it is on a RAID. Doubling the AG count from 4 to 8 doubles the metadata overhead, conflicts with our attempts to boost the log size, and on 2022-era storage hardware gains us very little extra performance since we're not limited by storage access times. Therefore, disable automatic detection of stripe unit and width if the data device is less than 1GB. We would like to format with 128M AGs to avoid constraining the size of the internal log, and since RAIDs smaller than 8GB are formatted with 8 AGs by default, 128*8=1G was chosen as the cutoff. Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx> --- man/man8/mkfs.xfs.8.in | 6 +++--- mkfs/xfs_mkfs.c | 14 ++++++++++++++ 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in index c9e9a9a6..b961bc30 100644 --- a/man/man8/mkfs.xfs.8.in +++ b/man/man8/mkfs.xfs.8.in @@ -456,13 +456,13 @@ is expressed as a multiplier of the stripe unit, usually the same as the number of stripe members in the logical volume configuration, or data disks in a RAID device. .IP -When a filesystem is created on a logical volume device, +When a filesystem is created on a block device, .B mkfs.xfs -will automatically query the logical volume for appropriate +will automatically query the block device for appropriate .B sunit and .B swidth -values. +values if the block device and the filesystem size would be larger than 1GB. .TP .BI noalign This option disables automatic geometry detection and creates the filesystem diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index a5e2df76..68d6bd18 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2583,6 +2583,20 @@ _("%s: Volume reports invalid stripe unit (%d) and stripe width (%d), ignoring.\ progname, BBTOB(ft->dsunit), BBTOB(ft->dswidth)); ft->dsunit = 0; ft->dswidth = 0; + } else if (cfg->dblocks < GIGABYTES(1, cfg->blocklog)) { + /* + * Don't use automatic stripe detection if the device + * size is less than 1GB because the performance gains + * on such a small system are not worth the risk that + * we'll end up with an undersized log. + */ + if (ft->dsunit || ft->dswidth) + fprintf(stderr, +_("%s: small data volume, ignoring data volume stripe unit %d and stripe width %d\n"), + progname, ft->dsunit, + ft->dswidth); + ft->dsunit = 0; + ft->dswidth = 0; } else { dsunit = ft->dsunit; dswidth = ft->dswidth;