[PATCH 2/4] mkfs: rework AG header initialisation ordering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Dave Chinner <dchinner@xxxxxxxxxx>

When observing the behaviour of an 8EB mkfs execution, I noticed
that a phase where there are a massive number of read/modify/write
cycles occurring. I didn't wait for it to complete - it was obvious
that it was after all the AG headers had been written. That left the
AGFL initialisation as the likely cause.

When all the AG headers don't fit in the libxfs buffer cache, the
AGFL init requires re-reading the AGF, the AGFL, the free space tree
root blocks and the rmap tree root block. They all then get
modified and written back out. 10 IOs per AG. When you have 8
million AGs, that's a lot of extra IO.

Change the initialisation algorithm to initialise the AGFL
immediately after initialising the rest of the headers and
calculating the minimum AGFL size for that AG. This means the
modifications will all hit the buffer cache and this will remove the
IO penalty.

The "worst_freelist" size calculation doesn't change from AG to AG -
it's based on the physical configuration of the AG, and all AGs have
the same configuration. hence we only need to calculate this once,
not for every AG. That allows us to initialise the AGFL immediately
after the rest of the AG has been initialised rather than in a
separate pass.

TIme to make a filesystem from scratch, using a zeroed device so the
force overwrite algorithms are not triggered and -K to avoid
discards:

FS size		10PB	100PB	 1EB
current mkfs	26.9s	214.8s	2484s
patched		11.3s	 70.3s	 709s

In both cases, the IO profile looks identical for the initial AG
header writeout loop. The difference is that the old code then
does the RMW loop to init the AGFL, and that runs at about half the
speed. Hence runtime of the new code is reduce by around 65-70%
simply by avoiding all that IO.


Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 mkfs/xfs_mkfs.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index c153592c705e..d70fbdb6b15a 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3374,7 +3374,7 @@ initialise_ag_headers(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp,
 	xfs_agnumber_t		agno,
-	int			*worst_freelist)
+	int			*freelist_size)
 {
 	struct xfs_perag	*pag = libxfs_perag_get(mp, agno);
 	struct xfs_agfl		*agfl;
@@ -3453,8 +3453,22 @@ initialise_ag_headers(
 		agf->agf_longest = cpu_to_be32(agsize -
 			XFS_FSB_TO_AGBNO(mp, cfg->logstart) - cfg->logblocks);
 	}
-	if (libxfs_alloc_min_freelist(mp, pag) > *worst_freelist)
-		*worst_freelist = libxfs_alloc_min_freelist(mp, pag);
+
+	/*
+	 * The AGFL size is the same for all AGs because all AGs have the same
+	 * layout. If this AG sameness ever changes in the future, we'll need to
+	 * revisit how we initialise the AGFLs.
+	 */
+	if (*freelist_size == 0)
+		*freelist_size = libxfs_alloc_min_freelist(mp, pag);
+	else if (*freelist_size < libxfs_alloc_min_freelist(mp, pag)) {
+		fprintf(stderr,
+_("%s: Abort! Freelist size (%u) for AG %u not constant (%u)!\n"),
+			progname, libxfs_alloc_min_freelist(mp, pag),
+			agno, *freelist_size);
+		exit(1);
+	}
+
 	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
 
 	/*
@@ -3724,14 +3738,14 @@ static void
 initialise_ag_freespace(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
-	int			worst_freelist)
+	int			freelist_size)
 {
 	struct xfs_alloc_arg	args;
 	struct xfs_trans	*tp;
 	struct xfs_trans_res tres = {0};
 	int			c;
 
-	c = libxfs_trans_alloc(mp, &tres, worst_freelist, 0, 0, &tp);
+	c = libxfs_trans_alloc(mp, &tres, freelist_size, 0, 0, &tp);
 	if (c)
 		res_failed(c);
 
@@ -3797,7 +3811,7 @@ main(
 	int			quiet = 0;
 	char			*protofile = NULL;
 	char			*protostring = NULL;
-	int			worst_freelist = 0;
+	int			freelist_size = 0;
 
 	struct libxfs_xinit	xi = {
 		.isdirect = LIBXFS_DIRECT,
@@ -4025,16 +4039,12 @@ main(
 	}
 
 	/*
-	 * Initialise all the static on disk metadata.
+	 * Initialise all the AG headers on disk.
 	 */
-	for (agno = 0; agno < cfg.agcount; agno++)
-		initialise_ag_headers(&cfg, mp, sbp, agno, &worst_freelist);
-
-	/*
-	 * Initialise the freespace freelists (i.e. AGFLs) in each AG.
-	 */
-	for (agno = 0; agno < cfg.agcount; agno++)
-		initialise_ag_freespace(mp, agno, worst_freelist);
+	for (agno = 0; agno < cfg.agcount; agno++) {
+		initialise_ag_headers(&cfg, mp, sbp, agno, &freelist_size);
+		initialise_ag_freespace(mp, agno, freelist_size);
+	}
 
 	/*
 	 * Allocate the root inode and anything else in the proto file.
-- 
2.17.0




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux