Re: Storing (hidden) per-commit metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2010-02-22 at 16:08 +0300, Dmitry Potapov wrote:
> On Mon, Feb 22, 2010 at 12:59:32PM +0100, Jelmer Vernooij wrote:
> > We'd like to have the extra metadata in Git so that we can push Bazaar
> > commits into a Git repository losslessly. If we can't do this losslessly
> > then the identity of the commit changes just like it does in git if you
> > aren't able to produce the same tree, blob and commit objects.
> but the problem is that you may want to add some information when you
> import some Git to Bazaar. For instance, Git does not record file
> renames explicitly and relies on content of files to detect renames
> automatically. So, when I use gitk, I can see that what file is renamed.
> If you work in Bazaar, you probably also want to see renames, but this
> requires that you add this information when you import commits to
> Bazaaar. But if you do that, the export to Git will produce a different
> commit just because you added this Bazaar-specific data.
We can already do the other way around - Bazaar allows storing arbitrary
revision properties, so we use that to some things that can not be
represented in Bazaar but exist in Git. An example of this are  the
unusual file modes created by older versions of git or non-utf8 commit
messages. Those extra revision properties are set at the moment that the
Bazaar revision is imported into Git, not afterwards and there is no
need to update them later.

The fact that we have this extra metadata allows us to reproduce the
original Git commit bit for bit so we can actually extract the same
revision that went in, with the same git sha1.

> > > > Having a bzr/master ref means that the extra metadata will not always be
> > > > copied around (unless git is patched), so if I push my work from Bazaar
> > > > into Git, somebody works on it in Git and pushes a derived branch and
> > > > then somebody else clones that derived Git branch into Bazaar again, I
> > > > will not be able to communicate with that person's branch.
> > > No matter how many times a branch was cloned, it is exactly same branch
> > > (i.e. it consists of commits having exactly the same id). So, if you can
> > > work with the original branch, you can work with any cloned branch. So,
> > > I see no need to copy this data around for people who do not work with
> > > Bazaar directly.
> > The original branch is a Bazaar branch here, so that's not true. You can
> > only work with any cloned branch if the matching bzr/ branch is also
> > around. If it isn't then you won't be able to find the original commit. 
> Obviously bzr/ branch should be around somewhere, but it does not have
> to be in any cloned repo. It is sufficient to have it in one place,
> because it refers to commit-id, which does not change when you clone it.
If some other Bazaar user clones that repo, they end up without the
Bazaar specific metadata and thus with different Bazaar commits. If they
then try to communicate with the Bazaar user that pushed the revisions
in, their histories appear unrelated.

> > hg-git already does something similar by putting a --HG-- line followed
> > by hg-git specific metadata in the commit message when it pushes into
> > Git. I'd like to find a place to put this data that's not as intruisive
> > for users.
> I still think it is wrong to hide some information in the commit object.
What exactly is the problem with doing so? "encoding" is already there
and as far as I can tell not displayed directly to the user.

> I am not sure that the commit object is the right place to store that
> metadata, but hidding this information is even more problematic. Let's
> suppose that someone cherry-pick your Bazaar originated commit. Now when
> you try to synchronize with Bazaar, your synchronizer will see that it
> has some Bazaar revision ID and branch name, but, in fact, it is new
> commit on a completely different branch...
I don't see how the fact that the bzr-git/hg-git data is being hidden is
the problem in the scenario you mention.

It'd be nice if this sort of information was discarded by "git rebase",
but that's another good reason to treat it in a different way from the
commit message instead.

Cheers,

Jelmer

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]