Bulk dump of git metadata / getting git metadata into a database

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm wondering if anyone happens to know of software to dump all a git
repo's metadata, both stored and derived, to a format - sql, xml, csv,
whatever - that is easily importable into a database / manipulated
programmatically.



Background, for the interested:

There is git repo HAPPY and and a separate git repo with branch SAD.

Repo HAPPY is canonical; branch SAD is in a separate fork repo. Files
from HAPPY have been copied over on an irregular basis to SAD. So SAD
has a mixture of files that are exactly the same as (the one in some
commit to) HAPPY, and files that have diverged since the initial copy
over from HAPPY as per the needs of the SAD fork.

The end goal is to get a diff that shows only fork-specific changes.
Identify the common file ancestor, and then diff the most recent
fork'ed file against that. Or put another way:

(a) Remove any files from SAD's most recent commit that are exactly
the same as any commit to HAPPY.

(b) For each file still in SAD's most recent commit, walk backwards in
SAD until a version is found that exists in HAPPY.

For (a) the below two git commands plus a little scripting look like enough:

# HAPPY: Get all file hases for a repo
git verify-pack -v .git/objects/pack/*.idx > HAPPY.hashes
grep ' blob   ' git.hashes | awk '{print $1}' > HAPPY.blobs

# SAD: Get hases and paths from current checkout
git ls-files --full-name -s | awk '{print $2" "$4}' > SAD.blobs

I haven't looked into (b) as much yet, but at the moment I'm thinking
of using git log to get a chronological list of commit hashes, then
walk backwards, at each checkout using git ls-files to dump the tree's
hashes to a separate file.

-- 
Daniel J Clark - off-list: djc @ first initial last name . us
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]