Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 12, 2018 at 8:09 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Mon, Nov 12, 2018 at 10:48:36AM -0500, Derrick Stolee wrote:
>
> > > If the "the first one is the main store, the rest are alternates" bit is
> > > too subtle, we could mark each "struct object_directory" with a bit for
> > > "is_local".
> >
> > This is probably a good thing to do proactively. We have the equivalent in
> > the packed_git struct, but that's also because they get out of order. At the
> > moment, I can't think of a read-only action that needs to treat the local
> > object directory more carefully. The closest I know about is 'git
> > pack-objects --local', but that also writes a pack-file.
> >
> > I assume that when we write a pack-file to the "default location" we use
> > get_object_directory() instead of referring to the default object_directory?
>
> Generally, yes, though that should eventually be going away in favor of
> accessing it via a "struct repository". And after my series,
> get_object_directory() is just returning the_repository->objects->odb->path
> (i.e., using the "first one is main" rule).
>
> One thing that makes me nervous about a "local" flag in each struct is
> that it implies that it's the source of truth for where to write to. So
> what does git_object_directory() look like after that? Do we leave it
> with the "first one is main" rule? Or does it become:

s/git/get/ ;-)  get_object_directory is very old and was introduced in
e1b10391ea (Use git config file for committer name and email info,
2005-10-11) by Linus.

I would argue that we might want to get rid of that function now,
actually as it doesn't seem to add value to the code (assuming the
BUG never triggers), and using a_repo->objects->objectdir
or after this series a_repo->objects->odb->path; is just as short.

    $ git grep get_object_directory |wc -l
    30
    $ git grep -- "->objects->objectdir"  |wc -l
    10

Ah well, we're not there yet.

>   for (odb = the_repository->objects->odb; odb; odb = odb->next) {
>         if (odb->local)
>                 return odb->path;
>   }
>   return NULL; /* yikes? */
>
> ? That feels like it's making things more complicated, not less.

It depends if the caller cares about the local flag.

I'd think we can have more than one local, eventually?
Just think of the partial clone stuff that may have a local
set of promised stuff and another set of actual objects,
which may be stored in different local odbs.

If the caller cares about the distinction, they would need
to write out this loop as above themselves.
If they don't care, we could migrate them to not
use this function, so we can get rid of it?

> > > -   for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> > > -           prepare_multi_pack_index_one(r, odb->path, 0);
> > > -           prepare_packed_git_one(r, odb->path, 0);
> > > +   for (odb = r->objects->odb; odb; odb = odb->next) {
> > > +           int local = (odb == r->objects->odb);
> >
> > Here seems to be a place where `odb->is_local` would help.
>
> Yes, though I don't mind this spot in particular, as the check is pretty
> straight-forward.
>
> I think an example that would benefit more is the check_and_freshen()
> stuff. There we have two almost-the-same wrappers, one of which operates
> on just the first element of the list, and the other of which operates
> on all of the elements after the first.
>
> It could become:
>
>   static int check_and_freshen_odb(struct object_directory *odb_list,
>                                    const struct object_id *oid,
>                                    int freshen,
>                                    int local)
>   {
>         struct object_directory *odb;
>
>         for (odb = odb_list; odb; odb = odb->next) {
>                 static struct strbuf path = STRBUF_INIT;
>
>                 if (odb->local != local)
>                         continue;
>
>                 odb_loose_path(odb, &path, oid->hash);
>                 return check_and_freshen_file(path.buf, freshen);
>         }
>   }
>
>   int check_and_freshen_local(const struct object_id *oid, int freshen)
>   {
>         return check_and_freshen_odb(the_repository->objects->odb, oid,
>                                      freshen, 1);
>   }
>
>   int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
>   {
>         return check_and_freshen_odb(the_repository->objects->odb, oid,
>                                      freshen, 0);
>   }
>

I am fine with (a maybe better documented) "first is local" rule, but
the code above looks intriguing, except a little wasteful
(we need two full loops in check_and_freshen, but ideally we
can do by just one loop).

What does the local flag mean anyway in a world where we
have many odbs in a repository, that are not distinguishable
except by their order? AFAICT it is actually to be used for differentiating
how much we care in fsck/cat-file/packing, as it may be borrowed
from an alternate, so maybe the flag is rather to be named
after ownership and not so much about it locality?
(I think "borrowed" or "owned" or even just "important"
or "external" or "alternate" may work)

Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux