Re: [PATCH 00/30] tree-walk: mostly "mode" to "enum object_type"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 9, 2021 at 1:48 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
> On Tue, Mar 09 2021, Elijah Newren wrote:
>
> > On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> >>
> >> This large series goes on top of my 6 patch series for
> >> read_tree_recursive() as this one further refactors that function. See
> >> https://lore.kernel.org/git/20210308022138.28166-1-avarab@xxxxxxxxx/
> >> for that series.
> >>
> >> I noticed that since 2014 or so we haven't been doing the fsck checks
> >> for bad file modes in trees. This series fixes that. I plan to add
> >> tests etc. for that in another follow-up series.
> >>
> >> I wanted to get this out for review sooner than later, particularly
> >> since the fsck testing will probably get me down another refactoring
> >> path (fsck testing in general in this area is pretty bad...).
> >>
> >> As noted in 30/30 it would have been way easier to simply do an
> >> isolated fix for that bug by introducing some fsck-specific API for
> >> raw tree reading.
> >>
> >> But I thought the bug was symptomatic of a wider problem in our
> >> codebase. Namely that we pass around the tree's mode *a lot*.
> >>
> >> But almost everything that then deals with the mode doesn't per-se
> >> care about the mode bits in the tree, but using them to map that mode
> >> to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}.
> >>
> >> So this is a large refactoring of all users of the widely used
> >> tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the
> >> field to a scary "raw_mode".
> >>
> >> At that point we have just ~30-50 grep hits left for "raw_mode" in the
> >> codebase (depending on whether we count names in function parameters).
> >>
> >> Hopefully being in that state alleviates e.g. Elijah's concerns
> >> expressed in
> >> https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@xxxxxxxxxxxxxx/
> >> I agree that doing the equivalent of 30/30 on top of master would be
> >> way too scary, but once we're at 29/30 I think it's sane.
> >
> > It's partially less scary (good cleanups that help make things
> > clearer, your comment about the code having been in a similar state
> > once upon a time in patch 30), but in some ways even more scary (after
> > reading through 30/30 and readily noticing a few missing areas and
> > starting to dig and finding several more).
>
> Thanks a lot for your reviews. I'm going to let this sit for a fair bit
> longer before any re-roll, especially with the rc period, other eyeballs
> on this most welcome though :)
>
> >>
> >> I tested this in combination with his on-list series to add more
> >> merge-ort testing:
> >> https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@xxxxxxxxx/
> >>
> >> I found a regression I'd caused in the merge-ort.c code with those
> >> tests, fixed here. See the comment in merge-ort.c in 30/30.
> >
> > I've read through the whole series now.  It is nicely structured, and
> > has lots of good cleanups.  Unfortunately, there are also some clear
> > regressions noted in my comments on both patches 6 and 30.
> >
> > I'm particularly worried with patch 30's basic plan; I think it'd be
> > far safer to have the tree-walking default to returning canonicalized
> > modes but allowing callers to request it be off.  I think each caller
> > is going to need someone to audit the particular path for whether it
> > can be safely switched over to using raw modes on a case-by-case basis
> > and with the introduction of new tests.  Some callers probably aren't
> > worth the effort (e.g. merge-recursive).  Others might be, but require
> > a fair amount of work or other trade-offs.  I'm split about whether
> > merge-ort should consider it.  Using raw_modes might allow me to fix
> > one funny corner case issue in merge-ort that to my knowledge no user
> > has ever hit in practice, but I'm not sure fixing that testcase is
> > worth it.  To fix it, we'd also have to allow writing tree objects
> > with non-canonicalized modes to the object store (for a "temporary"
> > tree defining the virtual merge-base); that means new objects with
> > "broken"/"non-canonicalized" modes being written that users can
> > access.
>
> To add a bit to your worries, I think your "[...]does lower my worry
> some[...]" in 30/30 is unfortunately based on some unintentional lying
> on my part.
>
> I.e. my "yes our test coverage sucks, but[...]" was based on some
> misreading of the history. Here's a correction:
>
> 7146e66f086 didn't break the fsck check in mid-2014, it had been broken
> for much longer. See Jeff King's late-2014 E-mail about it here (which
> I've just now discovered):
> https://lore.kernel.org/git/20140923154751.GA19319@xxxxxxxx/#t
>
> Basically, decode_tree_entry() didn't sanitize the mode, but we did that
> in the tree_entry_extract() function, which is what fsck.c was using all
> along, so it was always getting pre-sanitized modes.
>
> But I think I was mostly right, somewhat by accident, AFAICT this was
> the state of things back then:
>
>     $ git grep '\b(init_tree_desc|fill_tree_descriptor|update_tree_entry|update_tree_entry|tree_entry)\(' -- '*.c' | wc -l
>     78
>     $ git grep '\b(get_tree_entry|tree_entry_extract)\(' -- '*.c' | wc -l
>     25
>
> Those were the API functions that gave you the un-canonical and
> canonical mode, respectively.
>
> But yes, I agree that it's probably a bit too scary.
>
> Where I was mainly trying to get to with 30/30 was that for any future
> code we'd more carefully review this whole "raw_mode" thing just because
> of its name.
>
> So what do you think about a version of 30/30 where existing API users
> immediately call canon_mode() upon calling the current API functions?
>
> It would be ugly and verbose now, but the benefit would be that we'd eye
> any future change that deals with this "raw_mode" with more suspicion,
> and likely convert it to use the object_type instead.

I don't have any strong objections to this.  I'm less worried about
ugly and verbose now, than I am in missing cases while converting;
it'd be easy to get some of the conversions wrong accidentally
(similar to the mode->is_tree changes for fast-import in patch 06 of
this series or the original read_tree() removal patch before you fixed
it up in v2), so I think it'd need to be done with some care.

> Or I could just back out of this whole 29-30 step and just add a "bare"
> API for fsck in particular.

That'd be fine as well, and certainly less concerning; perhaps it's
even the first step.  But I don't think we have to do it this way,
just that if we move towards raw_mode it needs to be done with a lot
of caution.

> I'm partial to renaming it to *something* just to make it more grep-able
> though, we have a "mode" in all sorts of structs all over the place...
>
> >> Ævar Arnfjörð Bjarmason (30):
> >>   diff.c: remove redundant canon_mode() call
> >>   notes & match-trees: use name_entry's "pathlen" member
> >>   cache.h: add a comment to object_type()
> >>   tree-walk.h: add object_type member to name_entry
> >>   tree-walk.c: migrate to using new "object_type" field when possible
> >>   cache.h: have base_name_compare() take "is tree?", not "mode"
> >>   tree-walk.h users: switch object_type(...) to new .object_type
> >>   tree.h: format argument lists of read_tree_recursive() users
> >>   tree.h users: format argument lists in archive.c
> >>   archive: get rid of 'stage' parameter
> >>   tree.h API: make read_tree_fn_t take an "enum object_type"
> >>   tree-walk.h users: migrate "p->mode &&" pattern
> >>   tree-walk.h users: refactor chained "mode" if/else into switch
> >>   tree-walk.h users: migrate miscellaneous "mode" to "object_type"
> >>   merge-tree tests: test for the mode comparison in same_entry()
> >>   merge-ort: correct reference to test in 62fdec17a11
> >>   fsck.c: switch on "object_type" in fsck_walk_tree()
> >>   tree-walk.h users: use temporary variable(s) for "mode"
> >>   tree-walk.h API: formatting changes for subsequent commit
> >>   tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode()
> >>   tree-walk.h API users: use "tmp" for mode in shift_tree_by()
> >>   tree-walk.h API: Add get_tree_entry_type()
> >>   tree-walk.h API: add a get_tree_entry_path() function
> >>   tree-walk.h API: document and format tree_entry_extract()
> >>   tree-entry.h API: rename tree_entry_extract() to
> >>     tree_entry_extract_mode()
> >>   tree-walk.h API: add a tree_entry_extract_all() function
> >>   tree-walk.h API: add a tree_entry_extract_type() function
> >>   tree-walk.h API users: rename "struct name_entry"'s "mode" to
> >>     "raw_mode"
> >>   tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode"
> >>   tree-walk.h API: move canon_mode() back out of decode_tree_entry()
> >>
> >>  archive.c              | 51 +++++++++++++-----------
> >>  blame.c                |  9 +++--
> >>  builtin/checkout.c     |  7 +++-
> >>  builtin/fast-import.c  |  8 ++--
> >>  builtin/grep.c         |  6 +--
> >>  builtin/log.c          |  7 ++--
> >>  builtin/ls-files.c     | 13 +++---
> >>  builtin/ls-tree.c      | 18 ++++-----
> >>  builtin/merge-tree.c   | 32 +++++++++------
> >>  builtin/mktree.c       |  4 +-
> >>  builtin/pack-objects.c |  6 +--
> >>  builtin/reflog.c       |  3 +-
> >>  builtin/rm.c           |  2 +-
> >>  builtin/update-index.c |  7 +++-
> >>  cache-tree.c           |  2 +-
> >>  cache.h                | 11 ++++--
> >>  combine-diff.c         |  8 ++--
> >>  delta-islands.c        |  2 +-
> >>  diff.c                 |  2 +-
> >>  fsck.c                 | 23 +++++------
> >>  http-push.c            |  6 ++-
> >>  line-log.c             |  2 +-
> >>  list-objects.c         | 20 +++++++---
> >>  match-trees.c          | 52 ++++++++++++------------
> >>  merge-ort.c            | 34 ++++++++++------
> >>  merge-recursive.c      | 33 ++++++++--------
> >>  notes.c                | 15 +++----
> >>  object-name.c          |  7 ++--
> >>  pack-bitmap-write.c    |  8 ++--
> >>  read-cache.c           | 16 ++++----
> >>  revision.c             | 12 ++++--
> >>  t/t4300-merge-tree.sh  | 44 +++++++++++++++++++++
> >>  tree-diff.c            | 44 ++++++++++++---------
> >>  tree-walk.c            | 89 +++++++++++++++++++++++++++++++-----------
> >>  tree-walk.h            | 67 ++++++++++++++++++++++++++-----
> >>  tree.c                 | 19 +++++----
> >>  tree.h                 |  5 ++-
> >>  unpack-trees.c         | 30 ++++++++------
> >>  walker.c               | 22 ++++++-----
> >>  39 files changed, 482 insertions(+), 264 deletions(-)
> >>
> >> --
> >> 2.31.0.rc0.126.g04f22c5b82




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux