On Sat, May 16, 2020 at 10:48:02AM -0700, Darrick J. Wong wrote: > On Sat, May 09, 2020 at 07:13:44AM -0400, Brian Foster wrote: > > On Sat, May 09, 2020 at 10:17:15AM +0200, Christoph Hellwig wrote: > > > On Fri, May 08, 2020 at 11:05:43AM -0400, Brian Foster wrote: > > > > On Fri, May 08, 2020 at 08:34:19AM +0200, Christoph Hellwig wrote: > > > > > xfs_ifork_ops add up to two indirect calls per inode read and flush, > > > > > despite just having a single instance in the kernel. In xfsprogs > > > > > phase6 in xfs_repair overrides the verify_dir method to deal with inodes > > > > > that do not have a valid parent, but that can be fixed pretty easily > > > > > by ensuring they always have a valid looking parent. > > > > > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > > > --- > > > > > > > > Code looks fine, but I assume we'll want a repair fix completed and > > > > merged before wiping this out: > > > > > > With the xfsprogs merge delays I'm not sure merged will work, but I'll > > > happily take your patch and get it in shape for submission. > > > > > > > The critical bit is that repair is fixed before this lands in xfsprogs, > > otherwise we just reintroduce the regression the callback mechanism was > > designed to fix. The repair change is not huge, but it's not necessarily > > trivial so it's probably worth making sure the repair change is at least > > reviewed before putting this into the kernel pipeline. > > > > BTW, I played with this a bit more yesterday and made some tweaks that I > > think make it a little cleaner. Namely instead of processing the parent > > bits in phases 3 and 4 and setting the parent in the internal structures > > in phase 4, to do everything in phase 3 and skip the repeat checks in > > phase 4. This has the side effect of eliminating some duplicate error > > messages where repair complains about the original bogus value in phase > > 3, sets it to zero, and then complains about the zero value again in > > phase 4. This still needs some auditing to assess whether we're losing > > any extra verification by setting the parent in phase 3, however. It > > also might be worth looking at giving the other dir formats the same > > treatment. Squashed diff of my local tree below... > > > > Brian > > > > diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c > > index 6685a4d2..96ed6a5b 100644 > > --- a/repair/dino_chunks.c > > +++ b/repair/dino_chunks.c > > @@ -859,14 +859,7 @@ next_readbuf: > > */ > > if (isa_dir) { > > set_inode_isadir(ino_rec, irec_offset); > > - /* > > - * we always set the parent but > > - * we may as well wait until > > - * phase 4 (no inode discovery) > > - * because the parent info will > > - * be solid then. > > - */ > > - if (!ino_discovery) { > > + if (ino_discovery) { > > ASSERT(parent != 0); > > set_inode_parent(ino_rec, irec_offset, parent); > > ASSERT(parent == > > diff --git a/repair/dir2.c b/repair/dir2.c > > index cbbce601..9c789b4a 100644 > > --- a/repair/dir2.c > > +++ b/repair/dir2.c > > @@ -165,7 +165,6 @@ process_sf_dir2( > > int tmp_elen; > > int tmp_len; > > xfs_dir2_sf_entry_t *tmp_sfep; > > - xfs_ino_t zero = 0; > > > > sfp = (struct xfs_dir2_sf_hdr *)XFS_DFORK_DPTR(dip); > > max_size = XFS_DFORK_DSIZE(dip, mp); > > @@ -480,6 +479,9 @@ _("corrected entry offsets in directory %" PRIu64 "\n"), > > * check parent (..) entry > > */ > > *parent = libxfs_dir2_sf_get_parent_ino(sfp); > > + if (!ino_discovery) > > + return 0; > > + > > > > /* > > * if parent entry is bogus, null it out. we'll fix it later . > > @@ -494,7 +496,7 @@ _("bogus .. inode number (%" PRIu64 ") in directory inode %" PRIu64 ", "), > > if (!no_modify) { > > do_warn(_("clearing inode number\n")); > > > > - libxfs_dir2_sf_put_parent_ino(sfp, zero); > > + libxfs_dir2_sf_put_parent_ino(sfp, mp->m_sb.sb_rootino); > > *dino_dirty = 1; > > *repair = 1; > > } else { > > @@ -529,7 +531,7 @@ _("bad .. entry in directory inode %" PRIu64 ", points to self, "), > > if (!no_modify) { > > do_warn(_("clearing inode number\n")); > > > > - libxfs_dir2_sf_put_parent_ino(sfp, zero); > > + libxfs_dir2_sf_put_parent_ino(sfp, mp->m_sb.sb_rootino); > > *dino_dirty = 1; > > *repair = 1; > > } else { > > diff --git a/repair/phase6.c b/repair/phase6.c > > index beceea9a..43bcea50 100644 > > --- a/repair/phase6.c > > +++ b/repair/phase6.c > > @@ -26,58 +26,6 @@ static struct xfs_name xfs_name_dot = {(unsigned char *)".", > > 1, > > XFS_DIR3_FT_DIR}; > > > > -/* > > - * When we're checking directory inodes, we're allowed to set a directory's > > - * dotdot entry to zero to signal that the parent needs to be reconnected > > - * during phase 6. If we're handling a shortform directory the ifork > > - * verifiers will fail, so temporarily patch out this canary so that we can > > - * verify the rest of the fork and move on to fixing the dir. > > - */ > > -static xfs_failaddr_t > > -phase6_verify_dir( > > - struct xfs_inode *ip) > > -{ > > - struct xfs_mount *mp = ip->i_mount; > > - struct xfs_ifork *ifp; > > - struct xfs_dir2_sf_hdr *sfp; > > - xfs_failaddr_t fa; > > - xfs_ino_t old_parent; > > - bool parent_bypass = false; > > - int size; > > - > > - ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK); > > - sfp = (struct xfs_dir2_sf_hdr *)ifp->if_u1.if_data; > > - size = ifp->if_bytes; > > - > > - /* > > - * If this is a shortform directory, phase4 may have set the parent > > - * inode to zero to indicate that it must be fixed. Temporarily > > - * set a valid parent so that the directory verifier will pass. > > - */ > > - if (size > offsetof(struct xfs_dir2_sf_hdr, parent) && > > - size >= xfs_dir2_sf_hdr_size(sfp->i8count)) { > > - old_parent = libxfs_dir2_sf_get_parent_ino(sfp); > > - if (old_parent == 0) { > > - libxfs_dir2_sf_put_parent_ino(sfp, mp->m_sb.sb_rootino); > > - parent_bypass = true; > > - } > > - } > > - > > - fa = libxfs_default_ifork_ops.verify_dir(ip); > > - > > - /* Put it back. */ > > - if (parent_bypass) > > - libxfs_dir2_sf_put_parent_ino(sfp, old_parent); > > - > > - return fa; > > -} > > - > > -static struct xfs_ifork_ops phase6_ifork_ops = { > > - .verify_attr = xfs_attr_shortform_verify, > > - .verify_dir = phase6_verify_dir, > > - .verify_symlink = xfs_symlink_shortform_verify, > > -}; > > - > > /* > > * Data structures used to keep track of directories where the ".." > > * entries are updated. These must be rebuilt after the initial pass > > @@ -1104,7 +1052,7 @@ mv_orphanage( > > (unsigned long long)ino, ++incr); > > > > /* Orphans may not have a proper parent, so use custom ops here */ > > - err = -libxfs_iget(mp, NULL, ino, 0, &ino_p, &phase6_ifork_ops); > > + err = -libxfs_iget(mp, NULL, ino, 0, &ino_p, &xfs_default_ifork_ops); > > Hmm. I'll have to look at this more thoroughly on a non-weekend, but I > think I like this approach, since it removes the weird quirk that if > repair fails after writing out a sf directory with parent==0, we'll have > transformed an fs with bad directory parent pointers to an fs with a > directory that totally fails the verifier. > I don't _think_ xfs_repair should ever write back the parent == 0 condition it creates. IIUC it's intended to be a transient state and the directory should eventually be fixed up or junked (or maybe lost+found?) in phase 6. However there is a similar quirk within xfs_repair itself in that phase 3 sets the parent from whatever bad value it is to zero, then phase 4 repeats the same general scan and complains about parent == 0 because that is also invalid. ;P Brian > So for the kernel patch, provisionally: > Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --D > > > if (err) > > do_error(_("%d - couldn't iget disconnected inode\n"), err); > > > > @@ -2875,7 +2823,7 @@ process_dir_inode( > > > > ASSERT(!is_inode_refchecked(irec, ino_offset) || dotdot_update); > > > > - error = -libxfs_iget(mp, NULL, ino, 0, &ip, &phase6_ifork_ops); > > + error = -libxfs_iget(mp, NULL, ino, 0, &ip, &xfs_default_ifork_ops); > > if (error) { > > if (!no_modify) > > do_error( > > >