Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Couder <christian.couder@xxxxxxxxx> writes:

> From: Jeff King <peff@xxxxxxxx>
>
> Hosting providers that allow users to "fork" existing
> repos want as much as possible those forks to use a
> small amount of disk space.

"... want those forks to consume as little amount of disk space as
possible?"  Or perhaps "... want those forks to share as much disk
space as possible"?

> Using alternates to keep all the objects from all the forks into a
> unique central repo is way to do that, but ...

s/is way to/is a way to/, I guess, but that makes it sound as if the
series invented a way to share objects without using alternates.

> it can have some
> drawbacks. Especially when packing the central repo, deltas will
> be created between objects from different forks.
>
> This can make cloning or fetching a fork much slower and much more
> CPU intensive as Git might have to compute new deltas for many
> objects to avoid sending objects from a different fork.
>
> Delta islands is a way to store objects from different forks in
> the same repo and packfile without having deltas between objects
> from different forks.

I think another paragraph between the above two is needed to briefly
outline the central idea (which would make it clear why that is
called "delta islands") is called for.  Perhaps

	Because the inefficiency primarily arises when an object is
	delitified against another object that does not exist in the
	same fork, we partion objects into sets that appear in the
	same fork, and define "delta islands".  When finding delta
	base, we do not allow an object outside the same island to
	be considered as its base.

or something along that line.  Perhaps that would even make the last
paragraph unnecessary.

> +struct island_bitmap {
> +	uint32_t refcount;
> +	uint32_t bits[];
> +};
> +
> +static uint32_t island_bitmap_size;
> +
> +/*
> + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
> + * of "old". Otherwise, the new bitmap is empty.
> + */
> +static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
> +{
> +	size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
> +	struct island_bitmap *b = xcalloc(1, size);

Is one of the variants of flex array macros applicable here?

> +	if (old)
> +		memcpy(b, old, size);
> +
> +	b->refcount = 1;
> +	return b;
> +}
> +
> +static void island_bitmap_or(struct island_bitmap *a, const struct island_bitmap *b)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < island_bitmap_size; ++i)
> +		a->bits[i] |= b->bits[i];
> +}
> +
> +static int island_bitmap_is_subset(struct island_bitmap *self,
> +		struct island_bitmap *super)
> +{
> +	uint32_t i;
> +
> +	if (self == super)
> +		return 1;
> +
> +	for (i = 0; i < island_bitmap_size; ++i) {
> +		if ((self->bits[i] & super->bits[i]) != self->bits[i])
> +			return 0;
> +	}
> +
> +	return 1;
> +}
> +#define ISLAND_BITMAP_BLOCK(x) (x / 32)
> +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
> +
> +static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
> +{
> +	self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
> +}
> +
> +static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
> +{
> +	return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 0;
> +}
> +

Not necessarily a complaint, but do we need another implementation
of bitmap here, or the compressed bitmap used for the pack bitmap is
unsuited for the purpose of this thing (e.g. perhaps it is overkill,
as we won't be shooting for saving disk footprint of a bitmap that
we are not going to save on disk anyway)?

> +static int cmp_tree_depth(const void *va, const void *vb)
> +{
> +	struct object_entry *a = *(struct object_entry **)va;
> +	struct object_entry *b = *(struct object_entry **)vb;
> +	return a->tree_depth - b->tree_depth;
> +}
> +
> +void resolve_tree_islands(int progress, struct packing_data *to_pack)
> +{
> +	struct progress *progress_state = NULL;
> +	struct object_entry **todo;
> +	int nr = 0;
> +	int i;
> +
> +	if (!island_marks)
> +		return;
> +
> +	/*
> +	 * We process only trees, as commits and tags have already been handled
> +	 * (and passed their marks on to root trees, as well. We must make sure
> +	 * to process them in descending tree-depth order so that marks
> +	 * propagate down the tree properly, even if a sub-tree is found in
> +	 * multiple parent trees.
> +	 */
> +	todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> +	for (i = 0; i < to_pack->nr_objects; i++) {
> +		if (oe_type(&to_pack->objects[i]) == OBJ_TREE)
> +			todo[nr++] = &to_pack->objects[i];
> +	}
> +	qsort(todo, nr, sizeof(*todo), cmp_tree_depth);

Hmph, at this stage nobody actually seems to set tree_depth; I am
wondering how a tree that is at the root in one commit but appears
as a subdirectory in another commit is handled by this code.

> +static regex_t *island_regexes;
> +static unsigned int island_regexes_alloc, island_regexes_nr;
> +static const char *core_island_name;

Are these (and the bitmap & hashtable) something that "everything in
the_repository" folks would come in and nuke as global variables?  I
haven't thought deeply about it, but these smell like that there
would be one set of such in-core variables per one in-core object
store plus refs (i.e. repository instance).




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux