Re: [PATCH 00/30] tree-walk: mostly "mode" to "enum object_type"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
> This large series goes on top of my 6 patch series for
> read_tree_recursive() as this one further refactors that function. See
> https://lore.kernel.org/git/20210308022138.28166-1-avarab@xxxxxxxxx/
> for that series.
>
> I noticed that since 2014 or so we haven't been doing the fsck checks
> for bad file modes in trees. This series fixes that. I plan to add
> tests etc. for that in another follow-up series.
>
> I wanted to get this out for review sooner than later, particularly
> since the fsck testing will probably get me down another refactoring
> path (fsck testing in general in this area is pretty bad...).
>
> As noted in 30/30 it would have been way easier to simply do an
> isolated fix for that bug by introducing some fsck-specific API for
> raw tree reading.
>
> But I thought the bug was symptomatic of a wider problem in our
> codebase. Namely that we pass around the tree's mode *a lot*.
>
> But almost everything that then deals with the mode doesn't per-se
> care about the mode bits in the tree, but using them to map that mode
> to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}.
>
> So this is a large refactoring of all users of the widely used
> tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the
> field to a scary "raw_mode".
>
> At that point we have just ~30-50 grep hits left for "raw_mode" in the
> codebase (depending on whether we count names in function parameters).
>
> Hopefully being in that state alleviates e.g. Elijah's concerns
> expressed in
> https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@xxxxxxxxxxxxxx/
> I agree that doing the equivalent of 30/30 on top of master would be
> way too scary, but once we're at 29/30 I think it's sane.

It's partially less scary (good cleanups that help make things
clearer, your comment about the code having been in a similar state
once upon a time in patch 30), but in some ways even more scary (after
reading through 30/30 and readily noticing a few missing areas and
starting to dig and finding several more).

>
> I tested this in combination with his on-list series to add more
> merge-ort testing:
> https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@xxxxxxxxx/
>
> I found a regression I'd caused in the merge-ort.c code with those
> tests, fixed here. See the comment in merge-ort.c in 30/30.

I've read through the whole series now.  It is nicely structured, and
has lots of good cleanups.  Unfortunately, there are also some clear
regressions noted in my comments on both patches 6 and 30.

I'm particularly worried with patch 30's basic plan; I think it'd be
far safer to have the tree-walking default to returning canonicalized
modes but allowing callers to request it be off.  I think each caller
is going to need someone to audit the particular path for whether it
can be safely switched over to using raw modes on a case-by-case basis
and with the introduction of new tests.  Some callers probably aren't
worth the effort (e.g. merge-recursive).  Others might be, but require
a fair amount of work or other trade-offs.  I'm split about whether
merge-ort should consider it.  Using raw_modes might allow me to fix
one funny corner case issue in merge-ort that to my knowledge no user
has ever hit in practice, but I'm not sure fixing that testcase is
worth it.  To fix it, we'd also have to allow writing tree objects
with non-canonicalized modes to the object store (for a "temporary"
tree defining the virtual merge-base); that means new objects with
"broken"/"non-canonicalized" modes being written that users can
access.

> Ævar Arnfjörð Bjarmason (30):
>   diff.c: remove redundant canon_mode() call
>   notes & match-trees: use name_entry's "pathlen" member
>   cache.h: add a comment to object_type()
>   tree-walk.h: add object_type member to name_entry
>   tree-walk.c: migrate to using new "object_type" field when possible
>   cache.h: have base_name_compare() take "is tree?", not "mode"
>   tree-walk.h users: switch object_type(...) to new .object_type
>   tree.h: format argument lists of read_tree_recursive() users
>   tree.h users: format argument lists in archive.c
>   archive: get rid of 'stage' parameter
>   tree.h API: make read_tree_fn_t take an "enum object_type"
>   tree-walk.h users: migrate "p->mode &&" pattern
>   tree-walk.h users: refactor chained "mode" if/else into switch
>   tree-walk.h users: migrate miscellaneous "mode" to "object_type"
>   merge-tree tests: test for the mode comparison in same_entry()
>   merge-ort: correct reference to test in 62fdec17a11
>   fsck.c: switch on "object_type" in fsck_walk_tree()
>   tree-walk.h users: use temporary variable(s) for "mode"
>   tree-walk.h API: formatting changes for subsequent commit
>   tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode()
>   tree-walk.h API users: use "tmp" for mode in shift_tree_by()
>   tree-walk.h API: Add get_tree_entry_type()
>   tree-walk.h API: add a get_tree_entry_path() function
>   tree-walk.h API: document and format tree_entry_extract()
>   tree-entry.h API: rename tree_entry_extract() to
>     tree_entry_extract_mode()
>   tree-walk.h API: add a tree_entry_extract_all() function
>   tree-walk.h API: add a tree_entry_extract_type() function
>   tree-walk.h API users: rename "struct name_entry"'s "mode" to
>     "raw_mode"
>   tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode"
>   tree-walk.h API: move canon_mode() back out of decode_tree_entry()
>
>  archive.c              | 51 +++++++++++++-----------
>  blame.c                |  9 +++--
>  builtin/checkout.c     |  7 +++-
>  builtin/fast-import.c  |  8 ++--
>  builtin/grep.c         |  6 +--
>  builtin/log.c          |  7 ++--
>  builtin/ls-files.c     | 13 +++---
>  builtin/ls-tree.c      | 18 ++++-----
>  builtin/merge-tree.c   | 32 +++++++++------
>  builtin/mktree.c       |  4 +-
>  builtin/pack-objects.c |  6 +--
>  builtin/reflog.c       |  3 +-
>  builtin/rm.c           |  2 +-
>  builtin/update-index.c |  7 +++-
>  cache-tree.c           |  2 +-
>  cache.h                | 11 ++++--
>  combine-diff.c         |  8 ++--
>  delta-islands.c        |  2 +-
>  diff.c                 |  2 +-
>  fsck.c                 | 23 +++++------
>  http-push.c            |  6 ++-
>  line-log.c             |  2 +-
>  list-objects.c         | 20 +++++++---
>  match-trees.c          | 52 ++++++++++++------------
>  merge-ort.c            | 34 ++++++++++------
>  merge-recursive.c      | 33 ++++++++--------
>  notes.c                | 15 +++----
>  object-name.c          |  7 ++--
>  pack-bitmap-write.c    |  8 ++--
>  read-cache.c           | 16 ++++----
>  revision.c             | 12 ++++--
>  t/t4300-merge-tree.sh  | 44 +++++++++++++++++++++
>  tree-diff.c            | 44 ++++++++++++---------
>  tree-walk.c            | 89 +++++++++++++++++++++++++++++++-----------
>  tree-walk.h            | 67 ++++++++++++++++++++++++++-----
>  tree.c                 | 19 +++++----
>  tree.h                 |  5 ++-
>  unpack-trees.c         | 30 ++++++++------
>  walker.c               | 22 ++++++-----
>  39 files changed, 482 insertions(+), 264 deletions(-)
>
> --
> 2.31.0.rc0.126.g04f22c5b82
>




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux