On Fri, May 15, 2020 at 11:52:39AM -0700, Darrick J. Wong wrote: > On Fri, May 15, 2020 at 07:41:16AM -0400, Brian Foster wrote: > > On Thu, May 14, 2020 at 12:20:37PM -0700, Darrick J. Wong wrote: > > > On Thu, May 14, 2020 at 11:09:33AM -0400, Brian Foster wrote: > > > > On Sat, May 09, 2020 at 09:31:47AM -0700, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > > > Port the new btree staging context and related block reservation helper > > > > > code from the kernel to repair. We'll use this in subsequent patches to > > > > > implement btree bulk loading. > > > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > --- > > > > > include/libxfs.h | 1 > > > > > libxfs/libxfs_api_defs.h | 2 > > > > > repair/Makefile | 4 - > > > > > repair/bload.c | 276 ++++++++++++++++++++++++++++++++++++++++++++++ > > > > > repair/bload.h | 79 +++++++++++++ > > > > > repair/xfs_repair.c | 17 +++ > > > > > 6 files changed, 377 insertions(+), 2 deletions(-) > > > > > create mode 100644 repair/bload.c > > > > > create mode 100644 repair/bload.h > > > > > > > > > > > > > > ... > > > > > diff --git a/repair/bload.c b/repair/bload.c > > > > > new file mode 100644 > > > > > index 00000000..ab05815c > > > > > --- /dev/null > > > > > +++ b/repair/bload.c > > > > > @@ -0,0 +1,276 @@ > > > > > +// SPDX-License-Identifier: GPL-2.0-or-later > > > > > +/* > > > > > + * Copyright (C) 2020 Oracle. All Rights Reserved. > > > > > + * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > + */ > > > > > +#include <libxfs.h> > > > > > +#include "bload.h" > > > > > + > > > > > +#define trace_xrep_newbt_claim_block(...) ((void) 0) > > > > > +#define trace_xrep_newbt_reserve_space(...) ((void) 0) > > > > > +#define trace_xrep_newbt_unreserve_space(...) ((void) 0) > > > > > +#define trace_xrep_newbt_claim_block(...) ((void) 0) > > > > > + > > > > > +int bload_leaf_slack = -1; > > > > > +int bload_node_slack = -1; > > > > > + > > > > > +/* Ported routines from fs/xfs/scrub/repair.c */ > > > > > + > > > > > > > > Any plans to generalize/lift more of this stuff into libxfs if it's > > > > going to be shared with xfsprogs? > > > > > > That depends on what the final online repair code looks like. > > > I suspect it'll be different enough that it's not worth sharing, but I > > > wouldn't be opposed to sharing identical functions. > > > > > > > Ok, I was just going off the above note around porting existing code > > from kernel scrub. I think it's reasonable to consider generalizations > > later once both implementations are solidified. > > > > > > ... > > > > > +/* Free all the accounting infor and disk space we reserved for a new btree. */ > > > > > +void > > > > > +xrep_newbt_destroy( > > > > > + struct xrep_newbt *xnr, > > > > > + int error) > > > > > +{ > > > > > + struct repair_ctx *sc = xnr->sc; > > > > > + struct xrep_newbt_resv *resv, *n; > > > > > + > > > > > + if (error) > > > > > + goto junkit; > > > > > > > > Could use a comment on why we skip block freeing here.. > > > > > > I wonder what was the original reason for that? > > > > > > IIRC if we actually error out of btree rebuilds then we've done > > > something totally wrong while setting up the btree loader, or the > > > storage is so broken that writes failed. Repair is just going to call > > > do_error() to terminate (and leave us with a broken filesystem) so we > > > could just terminate right there at the top. > > > > > > > Indeed. > > Bah, I just realized that you and I have already reviewed a lot of this > stuff for the kernel, and apparently I never backported that. :( > Ok, I thought that stuff was actually merged so I'm kind of confused at this point. :P > In looking at what's in the kernel now, I realized that in general, > the xfs_btree_bload_compute_geometry function will estimate the correct > number of blocks to reserve for the new btree, so all this code exists > to deal with either (a) overestimates when rebuilding the free space > btrees; or (b) the kernel encountering a runtime error (e.g. ENOMEM) and > needing to back out everything it's done. > > For repair, (a) is still a possibility. (b) is not, since repair will > abort, but on the other hand it'll be easier to review a patch to unify > the two implementations if the code stays identical. > > Looking even further ahead, I plan to add two more users of the bulk > loader: rebuilders for the bmap btrees, and (even later) the realtime > rmapbt. It would be helpful to keep as much of the code the same > between repair and scrub. > > So for now we don't really need the ability to free an over-reservation, > but in the longer run it will make unification more obvious. > It's also easier to review code that's already been reviewed from the kernel and is being carted over for reuse, so I think it makes sense to keep things in sync for that reason as well. > /me vaguely wonders if we ought to be reviewing both of these patchsets > in parallel.... > Re: above. I thought that stuff was merged and the approach was to move the code over for reuse between scrub/xfs_repair. In any event, I think what would facilitate subsequent reviews is some explicit separation between patches for shared code and repair-specific code as well as some references in the cover letter for the source of the former if those bits haven't landed in the kernel yet... Brian > > > > I'm also wondering if we can check error in the primary loop and kill > > > > the label and duplicate loop, but I guess that depends on whether the > > > > fields are always valid. > > > > > > I think they are. > > > > > > > > + > > > > > + list_for_each_entry_safe(resv, n, &xnr->reservations, list) { > > > > > + /* We don't have EFIs here so skip the EFD. */ > > > > > + > > > > > + /* Free every block we didn't use. */ > > > > > + resv->fsbno += resv->used; > > > > > + resv->len -= resv->used; > > > > > + resv->used = 0; > > > > > + > > > > > + if (resv->len > 0) { > > > > > + trace_xrep_newbt_unreserve_space(sc->mp, > > > > > + XFS_FSB_TO_AGNO(sc->mp, resv->fsbno), > > > > > + XFS_FSB_TO_AGBNO(sc->mp, resv->fsbno), > > > > > + resv->len, xnr->oinfo.oi_owner); > > > > > + > > > > > + __libxfs_bmap_add_free(sc->tp, resv->fsbno, resv->len, > > > > > + &xnr->oinfo, true); > > > > > > TBH for repair I don't even think we need this, since in theory we > > > reserved *exactly* the correct number of blocks for the btree. Hmm. > > > > > > > Ok, well it would be good to clean up whether we remove it, clean it up > > or perhaps document why we wouldn't look at the resv fields on error if > > there turns out to be specific reason for that. > > <nod> > > > > > > + } > > > > > + > > > > > + list_del(&resv->list); > > > > > + kmem_free(resv); > > > > > + } > > > > > + > > > > > +junkit: > > > > > + list_for_each_entry_safe(resv, n, &xnr->reservations, list) { > > > > > + list_del(&resv->list); > > > > > + kmem_free(resv); > > > > > + } > > > > > + > > > > > + if (sc->ip) { > > > > > + kmem_cache_free(xfs_ifork_zone, xnr->ifake.if_fork); > > > > > + xnr->ifake.if_fork = NULL; > > > > > + } > > > > > +} > > > > > + > > > > ... > > > > > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c > > > > > index 9d72fa8e..8fbd3649 100644 > > > > > --- a/repair/xfs_repair.c > > > > > +++ b/repair/xfs_repair.c > > > > ... > > > > > @@ -49,6 +52,8 @@ static char *o_opts[] = { > > > > > [AG_STRIDE] = "ag_stride", > > > > > [FORCE_GEO] = "force_geometry", > > > > > [PHASE2_THREADS] = "phase2_threads", > > > > > + [BLOAD_LEAF_SLACK] = "debug_bload_leaf_slack", > > > > > + [BLOAD_NODE_SLACK] = "debug_bload_node_slack", > > > > > > > > Why the "debug_" in the option names? > > > > > > These are debugging knobs; there's no reason why any normal user would > > > want to override the automatic slack sizing algorithms. I also > > > refrained from documenting them in the manpage. :P > > > > > > > Oh, Ok. Perhaps that explains why they aren't in the usage() either. ;) > > Yup. > > --D > > > Brian > > > > > However, the knobs have been useful for stress-testing w/ fstests. > > > > > > --D > > > > > > > Brian > > > > > > > > > [O_MAX_OPTS] = NULL, > > > > > }; > > > > > > > > > > @@ -260,6 +265,18 @@ process_args(int argc, char **argv) > > > > > _("-o phase2_threads requires a parameter\n")); > > > > > phase2_threads = (int)strtol(val, NULL, 0); > > > > > break; > > > > > + case BLOAD_LEAF_SLACK: > > > > > + if (!val) > > > > > + do_abort( > > > > > + _("-o debug_bload_leaf_slack requires a parameter\n")); > > > > > + bload_leaf_slack = (int)strtol(val, NULL, 0); > > > > > + break; > > > > > + case BLOAD_NODE_SLACK: > > > > > + if (!val) > > > > > + do_abort( > > > > > + _("-o debug_bload_node_slack requires a parameter\n")); > > > > > + bload_node_slack = (int)strtol(val, NULL, 0); > > > > > + break; > > > > > default: > > > > > unknown('o', val); > > > > > break; > > > > > > > > > > > > > > >