From: Darrick J. Wong <djwong@xxxxxxxxxx> Recently, the upstream kernel maintainer has been taking a lot of heat on account of writer threads encountering high latency when asking for log grant space when the log is small. The reported use case is a heavily threaded indexing product logging trace information to a filesystem ranging in size between 20 and 250GB. The meetings that result from the complaints about latency and stall warnings in dmesg both from this use case and also a large well known cloud product are now consuming 25% of the maintainer's weekly time and have been for months. For small filesystems, the log is small by default because we have defaulted to a ratio of 1:2048 (or even less). For grown filesystems, this is even worse, because big filesystems generate big metadata. However, the log size is still insufficient even if it is formatted at the larger size. Therefore, if we're writing a new filesystem format (aka bigtime), bump the ratio unconditionally from 1:2048 to 1:256. On a 220GB filesystem, the 99.95% latencies observed with a 200-writer file synchronous append workload running on a 44-AG filesystem (with 44 CPUs) spread across 4 hard disks showed: Log Size (MB) Latency (ms) Throughput (MB/s) 10 520 243 20 220 308 40 140 360 80 92 363 160 86 364 For 4 NVME, the results were: 10 201 409 20 177 488 40 122 550 80 120 549 160 121 545 Hence we increase the ratio by 16x because there doesn't seem to be much improvement beyond that, and we don't want the log to grow /too/ large. This change does not affect filesystems larger than 4TB, nor does it affect formatting to older formats. Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> --- mkfs/xfs_mkfs.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 96682f9a..7178d798 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -3308,7 +3308,17 @@ _("external log device size %lld blocks too small, must be at least %lld blocks\ /* internal log - if no size specified, calculate automatically */ if (!cfg->logblocks) { - if (cfg->dblocks < GIGABYTES(1, cfg->blocklog)) { + if (cfg->sb_feat.bigtime) { + /* + * Starting with bigtime, everybody gets a 256:1 ratio + * of fs size to log size unless they say otherwise. + * Larger logs reduce contention for log grant space, + * which is now a problem with the advent of small + * non-rotational storage devices. + */ + cfg->logblocks = (cfg->dblocks << cfg->blocklog) / 256; + cfg->logblocks = cfg->logblocks >> cfg->blocklog; + } else if (cfg->dblocks < GIGABYTES(1, cfg->blocklog)) { /* tiny filesystems get minimum sized logs. */ cfg->logblocks = min_logblocks; } else if (cfg->dblocks < GIGABYTES(16, cfg->blocklog)) {