On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote: > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote: > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote: > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > --- > > > > fs/xfs/Makefile | 1 > > > > fs/xfs/xfs_fixups.c | 310 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > fs/xfs/xfs_fixups.h | 26 ++++ > > > > fs/xfs/xfs_mount.c | 21 +++ > > > > fs/xfs/xfs_super.c | 10 ++ > > > > 5 files changed, 367 insertions(+), 1 deletion(-) > > > > create mode 100644 fs/xfs/xfs_fixups.c > > > > create mode 100644 fs/xfs/xfs_fixups.h > > > > > > > > > > > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile > > > > index b03c77e..f88368a 100644 > > > > --- a/fs/xfs/Makefile > > > > +++ b/fs/xfs/Makefile > > > > @@ -86,6 +86,7 @@ xfs-y += xfs_aops.o \ > > > > xfs_extent_busy.o \ > > > > xfs_file.o \ > > > > xfs_filestream.o \ > > > > + xfs_fixups.o \ > > > > xfs_fsmap.o \ > > > > xfs_fsops.o \ > > > > xfs_globals.o \ > > > > diff --git a/fs/xfs/xfs_fixups.c b/fs/xfs/xfs_fixups.c > > > > new file mode 100644 > > > > index 0000000..0cad7bb > > > > --- /dev/null > > > > +++ b/fs/xfs/xfs_fixups.c > > > > @@ -0,0 +1,310 @@ > > > > +/* > > > > + * Copyright (C) 2018 Oracle. All Rights Reserved. > > > > + * > > > > + * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > + * > > > > + * This program is free software; you can redistribute it and/or > > > > + * modify it under the terms of the GNU General Public License > > > > + * as published by the Free Software Foundation; either version 2 > > > > + * of the License, or (at your option) any later version. > > > > + * > > > > + * This program is distributed in the hope that it would be useful, > > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > > > + * GNU General Public License for more details. > > > > + * > > > > + * You should have received a copy of the GNU General Public License > > > > + * along with this program; if not, write the Free Software Foundation, > > > > + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. > > > > + */ > > > > +#include "xfs.h" > > > > +#include "xfs_fs.h" > > > > +#include "xfs_shared.h" > > > > +#include "xfs_format.h" > > > > +#include "xfs_log_format.h" > > > > +#include "xfs_trans_resv.h" > > > > +#include "xfs_sb.h" > > > > +#include "xfs_mount.h" > > > > +#include "xfs_alloc.h" > > > > +#include "xfs_trans.h" > > > > +#include "xfs_fixups.h" > > > > + > > > > +/* > > > > + * v5 AGFL padding defects > > > > + * > > > > + * When the v5 format was first introduced, there was a defect in the struct > > > > + * xfs_agfl definition that resulted in XFS_AGFL_SIZE returning different > > > > > > XFS_AGFL_SIZE() no longer exists as of the previous patch. It might be > > > better to size refer to "the agfl size" generically. > > > > How about: > > > > "...the agfl length calculation formula (at the time known as > > XFS_AGFL_SIZE)..." > > > > The formula formerly known as..? ;) Heh, sounds fine. lol. :) > > > > + * values depending on the compiler padding. On a fs with 512-byte sectors, > > > > + * this meant that XFS_AGFL_SIZE was 119 on i386, but 118 on x64. Commit > > > > + * 96f859d52bcb1 ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is > > > > + * correct") changed the definition to disable padding the end of the > > > > + * structure, and was accepted into Linux 4.5. Since then, the AGFL has > > > > + * always used the larger size (e.g. 119 entries on a 512b sector fs). > > > > + * > > > > + * Unfortunately, pre-4.5 kernels can produce filesystems with AGFLs that wrap > > > > + * at the smaller size, and those kernels are not prepared to handle the > > > > + * longer size. This typically manifests itself as an AGF verifier corruption > > > > + * error followed by a filesystem shutdown. While we encourage admins to stay > > > > + * current with software, we would like to avoid this intermittent breakage. > > > > + * > > > > + * Any v5 filesystem which has a feature bit set for a feature that was > > > > + * introduced after Linux 4.5 will not have this problem, as such kernels > > > > + * cannot be mounted on older kernels. v4 filesystems are also unaffected. > > > > + * > > > > + * Therefore, we add two fixup functions -- the first runs at mount time to > > > > + * detect a short-wrapped AGFL and fix it; the second runs at unmount, freeze, > > > > + * or remount-ro time to move a wrapped AGFL to the beginning of the list. > > > > + * This reduces the likelihood of a screwup to the scenario where you have (a) > > > > + * a filesystem with no post-4.5 features (reflink, rmap), (b) the AGFL wraps, > > > > + * (c) the filesystem goes down leaving a dirty log, and (d) the dirty > > > > + * filesystem is mounted on an old kernel. > > > > + */ > > > > + > > > > > > While the mount vs. unmount time fixups serve different purposes and > > > have different conditions, do we really need two separate fixup > > > implementations? E.g., if we have to unwrap the AGFL at unmount time, > > > why not just reuse that fixup at mount time (when needed) as well? I > > > suspect the difference would only be the length of what we can consider > > > valid in the agfl at the time. > > > > They're doing two different things. I will use ASCII art to demonstrate > > the mount-time function: > > > > Yes, the question was whether we need to do two different things. For > example... > > > Let's say the AGFL block has space for 10 items. It looks like this: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | | | | | | | | | | | > > > > > > When the fs is freshly formatted or repaired, the AGFL will look like: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | A | B | C | D | E | F | | | | | > > ^-- flfirst ^-- fllast > > > > Due to the padding problems prior to 4.5 ("4.5 agfl padding fix"), > > XFS_AGFL_SIZE would return 10 on 32-bit systems and 9 on 64-bit systems. > > Therefore, if the AGFL wrapped on a 64-bit kernel we would end up with: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | D | E | F | | | | A | B | C | ? | > > ^-- fllast ^-- flfirst > > > > Note that block 9's contents are undefined because we didn't write > > anything there. Now the "4.5 agfl padding fix" has corrected > > XFS_AGFL_SIZE to return 10 in all cases, this looks like an AGF > > corruption because flcount is 6 but the distance between flfirst and > > fllast is 7. So long as the list wraps and the mismatch is exactly one > > block we can fix this. The mount time fixer in this patch does this by > > moving the records between flfirst and xfs_agfl_size (A, B, and C) to > > the end of the block to close the gap: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | D | E | F | | | | | A | B | C | > > ^-- fllast ^-- flfirst > > > > If the count is off by more than 1 then the AGF is truly corrupt and we > > bail out. > > > > Now let's say that we run for a while and want to unmount, but our AGFL > > wraps like this: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | T | U | V | | | | | Q | R | S | > > ^-- fllast ^-- flfirst > > > > We don't know that the next kernel to mount this filesystem will have > > the "4.5 agfl padding fix" applied to it; if it does not, it will flag > > the AGF as corrupted because flcount is 6 but in its view the distance > > between flfirst and fllast (which omits bno[9]) is 5. We don't want > > it to choke on that, so the unmount fixer moves all the records between > > flfirst and the end (Q, R, and S) towards the start of the block and > > resets flfirst/fllast: > > > > 0 1 2 3 4 5 6 7 8 9 > > Header | T | U | V | Q | R | S | | | | | > > ^-- flfirst ^-- fllast > > > > Could we use this basic algorithm at mount time as well? I know it > wouldn't be _exactly_ the same operation at mount time as it is for > unmount since slot 9 is a gap in the former case, but afaict the only > difference from an algorithmic perspective is the length of the shift. > > IOW, if we were to parameterize the length of the shift and have the > mount fixup call the unmount algorithm, would it not also address the > problem? Yes, I believe that would work. I think it would be more efficient in the patch below to memmove the entries instead of put_freelist'ing them individually. If the agfl is completely full then fix_freelist ought to trim it down by at least one element. The algorithm then becomes: if mounting and flfirst < fllast: return 0 if flcount == agfl_size: assert !mounting fix_freelist() assert flcount < agfl_size if flfirst < fllast: return 0 movelen = agfl_size - flfirst if active == flcount - 1: movelen-- memmove(&agflbno[fllast + 1], &agflbno[flfirst], movelen) flfirst = 0 fllast = fllast + movelen > > Since the AGFL no longer wraps at all, it doesn't matter if the next > > kernel to mount this filesystem has the "4.5 agfl padding fix" applied; > > all kernels can handle this correctly. > > > > > > +/* > > > > + * Decide if we need to have the agfl wrapping fixes applied. This only > > > > + * affects v5 filesystems that do not have any features enabled that did not > > > > + * exist when the agfl padding fix went in. > > > > + * > > > > + * Features already present when the fix went in were finobt, ftype, spinodes. > > > > + * If we see something new (e.g. reflink) then don't bother. > > > > + */ > > > > +#define XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED \ > > > > + (~(XFS_SB_FEAT_RO_COMPAT_FINOBT)) > > > > +#define XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED \ > > > > + (~(XFS_SB_FEAT_INCOMPAT_FTYPE | \ > > > > + XFS_SB_FEAT_INCOMPAT_SPINODES)) > > > > +#define XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED \ > > > > + (~0) > > > > +static inline bool xfs_sb_version_needs_agfl_wrap_fixes(struct xfs_sb *sbp) > > > > +{ > > > > + return xfs_sb_version_hascrc(sbp) && > > > > + !xfs_sb_has_incompat_feature(sbp, > > > > + XFS_SB_FEAT_INCOMPAT_AGFL_WRAP_ALREADY_FIXED) && > > > > + !xfs_sb_has_ro_compat_feature(sbp, > > > > + XFS_SB_FEAT_RO_COMPAT_AGFL_WRAP_ALREADY_FIXED) && > > > > + !xfs_sb_has_incompat_log_feature(sbp, > > > > + XFS_SB_FEAT_INCOMPAT_LOG_AGFL_WRAP_ALREADY_FIXED); > > > > +} > > > > + > > > > +/* > > > > + * Fix an AGFL wrapping that falls short of the end of the block by filling the > > > > + * gap at the end of the block. > > > > + */ > > > > +STATIC int > > > > +xfs_fixup_freelist_wrap_mount( > > > > + struct xfs_trans *tp, > > > > + struct xfs_buf *agfbp, > > > > + struct xfs_perag *pag) > > > > +{ > > > > + struct xfs_mount *mp = tp->t_mountp; > > > > + struct xfs_agf *agf; > > > > + struct xfs_buf *agflbp; > > > > + __be32 *agfl_bno; > > > > + xfs_agnumber_t agno; > > > > + uint32_t agfl_size; > > > > + uint32_t flfirst; > > > > + uint32_t fllast; > > > > + int32_t active; > > > > + int offset; > > > > + int len; > > > > + int error; > > > > + > > > > + if (pag->pagf_flcount == 0) > > > > + return 0; > > > > + > > > > + agfl_size = xfs_agfl_size(mp); > > > > + agf = XFS_BUF_TO_AGF(agfbp); > > > > + agno = be32_to_cpu(agf->agf_seqno); > > > > + flfirst = be32_to_cpu(agf->agf_flfirst); > > > > + fllast = be32_to_cpu(agf->agf_fllast); > > > > + > > > > + /* Make sure we're either spot on or off by 1. */ > > > > + active = fllast - flfirst + 1; > > > > + if (active <= 0) > > > > + active += agfl_size; > > > > + if (active == pag->pagf_flcount) > > > > + return 0; > > > > + else if (active != pag->pagf_flcount + 1) > > > > + return -EFSCORRUPTED; > > > > + > > > > > > So we're not attempting to cover the case where the agfl has 1 more > > > block than the agfl size (i.e., the case where an fs goes back to a > > > kernel with an unpacked header)? > > > > We don't know how the next kernel to touch this filesystem will define > > XFS_AGFL_SIZE -- it could be a 4.5+ kernel (same agfl size), a 32-bit > > pre-4.5 kernel (same agfl size), or a 64-bit pre-4.5 kernel (small agfl > > size). > > > > I don't think I was clear.. I'm envisioning whether we could come up > with a patch that would generically fix up the agfl on disk to be sane > relative to the current kernel. This patch covers the case of a packed > agfl kernel mounting an unpacked on-disk agfl. It would be nice if we > could implement something that also handled a packed on-disk agfl to an > unpacked agfl kernel (for easier backport to unpacked kernels, for > e.g.). If we're going to touch an old kernel's source at all I'd rather we backport both the packing fix and this fixer-upper. > > > I'm wondering whether using the unmount algorithm in both places (as > > > noted above) would also facilitate this mechanism to work both ways > > > (unpacked <-> packed), provided particular conditions are met. I suppose > > > that could be considered regardless, but the less variance the better > > > IMO. > > > > In principle I suppose we could have a single function to handle all the > > rearranging, but I'd rather have two functions to handle the two cases. > > I'm open to refactoring the common parts out of the > > xfs_fixup_freelist_wrap_*mount functions. > > > > As noted above, I'm not really commenting on the function factoring here > as much as the use of multiple algorithms (using one would certainly > facilitate some mechanical refactoring). TBH the patch you sent is pretty close to what I was imagining. > > > > + /* Would this have even passed muster on an old system? */ > > > > > > Comment doesn't really explain what is going on here..? > > > > /* > > * If the distance between flfirst and fllast mismatches flcount by more > > * than 1, then there's more wrong with this agfl than just the padding > > * problem. Bail out completely, which will force the admin to run > > * xfs_repair. > > */ > > > > *nod* > > > > > > > > + if (flfirst >= agfl_size - 1 || fllast >= agfl_size - 1 || > > > > + pag->pagf_flcount > agfl_size - 1) > > > > + return -EFSCORRUPTED; > > > > + > > > > > > I take it these are checks for whether the agfl was previously corrupted > > > based on the unpacked size? FWIW, it might be a bit more clear to use an > > > old/unpacked_agfl_size variable to declare intent (here and in some of > > > the comments and whatnot that follow). > > > > Ok. > > > > > > + /* > > > > + * Convert a 40-byte-padded agfl into a 36-byte-padded AGFL. > > > > + * Therefore, we need to move the AGFL blocks > > > > + * bno[flfirst..agfl_size - 2] to bno[flfirst + 1...agfl_size - 1]. > > > > + * > > > > + * Reusing the example above, if we had flfirst == 116, we need > > > > + * to move bno[116] and bno[117] into bno[117] and bno[118], > > > > + * respectively, and then increment flfirst. > > > > + */ > > > > > > Kind of a strange example to use given that it doesn't mention > > > wrapping.. this only triggers if the agfl was previously wrapped, right? > > > > Right. I used to have a check that if the list didn't wrap then we > > would just exit because no fixing is required... but it must've fallen > > out. I agree that we don't want to rely on a subtlety here to stay out > > of here if the agfl wasn't initially wrapped. > > > > > > + error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp); > > > > + if (error) > > > > + return error; > > > > + agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp); > > > > + > > > > + len = (agfl_size - flfirst - 1) * sizeof(xfs_agblock_t); > > > > + memmove(&agfl_bno[flfirst + 1], &agfl_bno[flfirst], len); > > > > + offset = (char *)&agfl_bno[flfirst + 1] - (char *)agflbp->b_addr; > > > > + be32_add_cpu(&agf->agf_flfirst, 1); > > > > + > > > > + xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF); > > > > + xfs_trans_log_buf(tp, agflbp, offset, offset + len - 1); > > > > + xfs_trans_brelse(tp, agflbp); > > > > + agflbp = NULL; > > > > + xfs_alloc_log_agf(tp, agfbp, XFS_AGF_FLFIRST); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +/* > > > > + * Fix an AGFL that touches the end of the block by moving the first or last > > > > + * part of the list elsewhere in the AGFL so that old kernels don't trip over > > > > + * wrapping issues. > > > > + */ > > > > +STATIC int > > > > +xfs_fixup_freelist_wrap_unmount( > > > > + struct xfs_trans *tp, > > > > + struct xfs_buf *agfbp, > > > > + struct xfs_perag *pag) > > > > +{ > > > > + struct xfs_mount *mp = tp->t_mountp; > > > > + struct xfs_agf *agf; > > > > + struct xfs_buf *agflbp; > > > > + __be32 *agfl_bno; > > > > + xfs_agnumber_t agno; > > > > + uint32_t agfl_size; > > > > + uint32_t flfirst; > > > > + uint32_t fllast; > > > > + int offset; > > > > + int len; > > > > + int error; > > > > + > > > > + agfl_size = xfs_agfl_size(mp); > > > > + agf = XFS_BUF_TO_AGF(agfbp); > > > > + agno = be32_to_cpu(agf->agf_seqno); > > > > + flfirst = be32_to_cpu(agf->agf_flfirst); > > > > + fllast = be32_to_cpu(agf->agf_fllast); > > > > + > > > > + /* Empty AGFL? Make sure we aren't pointing at the end. */ > > > > + if (pag->pagf_flcount == 0) { > > > > + if (flfirst >= agfl_size || fllast >= agfl_size) { > > > > > > When would either field be >= agfl_size? Isn't that where they wrap? > > > > Oops, you're right, that should be flfirst >= agfl_size - 1. > > > > Or really, > > > > old_agfl_size = agfl_size - 1 > > if flfirst >= old_agfl_size or fllast >= old_agfl_size: > > blahblahblah > > > > > > > > > + agf->agf_flfirst = cpu_to_be32(1); > > > > + agf->agf_fllast = 0; > > > > + xfs_alloc_log_agf(tp, agfbp, > > > > + XFS_AGF_FLFIRST | XFS_AGF_FLLAST); > > > > + } > > > > + return 0; > > > > + } > > > > + > > > > + /* If we don't hit the end, we're done. */ > > > > + if (flfirst < fllast && fllast != agfl_size - 1) > > > > + return 0; > > > > + > > > > + /* > > > > + * Move a start of a wrapped list towards the start of the agfl block. > > > > > > FWIW, this and the subsequent comments kind of gave me the impression > > > that the agfl would be "shifted" towards the start of the block. The > > > code doesn't do that, but rather rotates the start/head of the agfl to > > > the tail and adjusts the pointers (changing the effective order). That > > > seems technically Ok, but I had to grok the code and read back to > > > understand the comment rather than the other way around. > > > > Yeah. Sorry about the confusion, maybe it would be better if I took out > > these confusing comments and replaced it all with the ascii art above? > > > > I'm still hoping we can do something a bit more simple and flexible in > general, but that aside, the ascii art describes the associated > algorithms much better. > > > > > + * Therefore, we need to move the AGFL blocks > > > > + * bno[flfirst..agfl_size - 1] to bno[fllast + 1...agfl_size - flfirst]. > > > > + * Then we reset flfirst and fllast appropriately. > > > > + * > > > > + * Reusing the example above, if we had flfirst == 117 and fllast == 4, > > > > + * we need to move bno[117] and bno[118] into bno[5] and bno[6], > > > > + * respectively, and then reset flfirst and fllast. > > > > + * > > > > + * If it's just the last block that touches the end, only move that. > > > > + */ > > > > + error = xfs_alloc_read_agfl(mp, tp, agno, &agflbp); > > > > + if (error) > > > > + return error; > > > > + agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp); > > > > + > > > > + if (fllast == agfl_size - 1) { > > > > + /* Back the AGFL off from the end of the block. */ > > > > + len = sizeof(xfs_agblock_t); > > > > + agfl_bno[flfirst - 1] = agfl_bno[agfl_size - 1]; > > > > > > What if the agfl is full (flfirst == 0)? > > > > I don't think we ever unmount with a full agfl -- we're guaranteed at > > least 118 items in the AGFL, and the maximum size the agfl will ever > > need is 8 blocks for each of bnobt, cntbt, rmapbt. > > > > However, we could just detect a full agfl and free the end block. > > > > I suppose it's unlikely we'll ever see corner cases such as a completely > full or empty agfl. That said, I think the code needs to handle those > cases sanely. That could mean explicit checks and skipping the fixup for > all I care, just as long as we don't have memory corruption vectors and > whatnot if (when?) assumptions prove wrong. > > > > > + offset = (char *)&agfl_bno[flfirst - 1] - (char *)agflbp->b_addr; > > > > + be32_add_cpu(&agf->agf_fllast, -1); > > > > + be32_add_cpu(&agf->agf_flfirst, -1); > > > > + } else { > > > > This needs to check for flfirst > fllast. > > > > > > + /* Move the first part of the AGFL towards the front. */ > > > > + len = (agfl_size - flfirst) * sizeof(xfs_agblock_t); > > > > + memcpy(&agfl_bno[fllast + 1], &agfl_bno[flfirst], len); > > > > + offset = (char *)&agfl_bno[fllast + 1] - (char *)agflbp->b_addr; > > > > + agf->agf_flfirst = 0; > > > > + agf->agf_fllast = cpu_to_be32(pag->pagf_flcount - 1); > > > > > > Similar question here as above.. it looks like the copy may be a no-op, > > > but we still reset flfirst/fllast (which may also be fine, but > > > unnecessary). > > > > > > The more interesting case here is flcount == 1 and flfirst == fllast, > > > where it looks like we'd wrongly reset first/last (and perhaps copy more > > > than we should..?). > > > > > > Ugh. Remind me again why we can't just detect a mismatch, fail the mount > > > and tell the user to repair? :P > > > > Not all the distros included xfs_repair in the initrd or the logic to > > call it (as opposed to "fsck -y") if the root fs fails to mount, which > > means that if this wrapping problem hits a root filesystem then the > > administrator will have to reboot with a recovery iso and run xfs_repair > > from there. That's going to generate a /lot/ of support work compared > > to us figuring out how to repair the filesystems automatically. > > > > Yeah, sorry. Good argument for the unpacked -> packed fixup. > > > > Random thought... have we considered anything less invasive/more > > > simple for this issue than low level buffer manipulation? For example, > > > could we address the unpacked -> packed case by just allocating a > > > block to fill the gap at the end of the wrapped agfl? > > > > The allocation could fail, in which case we'd be forced to resort to > > list manipulation. > > > > Yeah.. obvious tradeoff, but how likely is that? It seems a reasonable > tradeoff to me given there's still the repair path. Also note that we > don't have to necessarily allocate from the btrees. > > (FWIW, I hacked around on this a bit and ended up with something > slightly different[1].) > > > > Perhaps that could even be done more discreetly in > > > xfs_alloc_fix_freelist() when we already have a transaction set up, > > > etc. > > > > Possible, though this will add more logic to every hot allocation call > > for something that only needs fixing at mount and at unmount. I think > > there's a stronger argument for doing the unmount cleanup on every > > fix_freelist since that would eliminate the small chance that a new > > kernel crashes and gets recovered on an old kernel. > > > > Eh, I don't think that's anything we couldn't address if need be. E.g., > amortize the padding fixup against actual allocations, set/test a pag > bit to track the padding fixup, or check it on agf read and set a "needs > fixup" bit, etc. > > > > That wouldn't help going the other direction, but maybe we could > > > consider similar logic to free an extraneous block and then worry > > > about protecting users who won't upgrade broken kernels separately if > > > that really continues to be a problem. > > > > It is already a problem here. > > > > Then it should be able to stand alone as an independent patch (with > independent commit log that explains/justifies it, etc.). > > Brian > > [1] This is hacky and brute force, but the idea was to try and do > something that could work in either direction and reuse more existing > code. It basically just reads the on-disk entries "manually" and > reinserts them into a reinitialized (empty) agfl. Perhaps it could be > refactored into something that only rotates a given number of entries or > something of that nature. Anyways, just an experiment and probably > broken.. I'll give this a whirl and see where we end up. --D > int > xfs_agfl_fixup( > struct xfs_trans *tp, > struct xfs_buf *agbp, > struct xfs_perag *pag) > { > struct xfs_mount *mp = tp->t_mountp; > struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp); > int agfl_size = XFS_AGFL_SIZE(mp); > int ofirst, olast, osize; > int nfirst, nlast; > struct xfs_buf *agflbp; > __be32 *agfl_bno; > int active; > xfs_agblock_t bno; > bool wasempty = false; > > if (pag->pagf_flcount == 0) > wasempty = true; > > ofirst = be32_to_cpu(agf->agf_flfirst); > olast = be32_to_cpu(agf->agf_fllast); > if (olast >= ofirst) > active = olast - ofirst + 1; > else > active = agfl_size - ofirst + olast + 1; > > if (active == pag->pagf_flcount + 1) > osize = agfl_size - 1; > else if ((active == pag->pagf_flcount - 1) || > ofirst == agfl_size || olast == agfl_size) > osize = agfl_size + 1; > else if (active == pag->pagf_flcount) > osize = agfl_size; > else > return -EFSCORRUPTED; > > xfs_alloc_read_agfl(mp, tp, be32_to_cpu(agf->agf_seqno), &agflbp); > agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp); > > nlast = do_mod(olast, agfl_size); > nfirst = nlast + 1; > nfirst = do_mod(nfirst, agfl_size); > ASSERT(nfirst != ofirst); > > agf->agf_flfirst = cpu_to_be32(nfirst); > agf->agf_fllast = cpu_to_be32(nlast); > agf->agf_flcount = 0; > xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST | > XFS_AGF_FLCOUNT); > pag->pagf_flcount = 0; > > if (wasempty) > goto out; > > while (true) { > ofirst = do_mod(ofirst, osize); > bno = be32_to_cpu(agfl_bno[ofirst]); > xfs_alloc_put_freelist(tp, agbp, agflbp, bno, 0); > if (ofirst == olast) > break; > ofirst++; > } > > out: > xfs_trans_brelse(tp, agflbp); > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html