On Mon, Oct 12, 2009 at 9:57 PM, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote: > imyousuf@xxxxxxxxx wrote: >> The SPI mainly focus's in providing an API to JGit to be able to perform >> similar operations to that of java.io.File. All direct I/O is based on the >> java.io.Input/OutputStream classes. >> >> Different JGit IO SPI provider is designed to be URI scheme based and thus >> the default implementation is that of "file" scheme. SPI provider will be <snip /> > I think this may be a bit in the wrong direction for what we are > trying to accomplish. > > A number of people really want to map Git onto what is essentially > Google's BigTable schema. Aside from Google's own BigTable product > (which I want to use Git on at work, because it would vastly simplfiy > my system administration duties at $DAYJOB) there is Cassandra and > Hadoop HBase which implement the same schema semantics. > > None of those systems implement file streams, they implement cell > storage in a non-transactional system with a semi-dynamic schema. > > Some people have built transactional semantics on top of these > storage layers, e.g. Google AppEngine provides multiple row > transactions through some magic sauce layered on top of BigTable. > I'm sure people will build similar tools on top of Cassandra > and HBase. > > Where I'm trying to go with this is that things that are stored > in files on the filesystem in traditional Git wouldn't normally be > mapped into "byte streams" in a BigTable-ish system, or even the > JDBC-ish system you were describing. > > For .git/config we might want to map config variable names into > keys in the table, with values stored in cells. This makes it > easier to query or edit the data. > > Fortunately, "Config" is abstract enough that we could subclass > it with a CassandraConfig and simply use that instance when on a > based Cassandra storage system. No file streams required. Ditto > for a JdbcConfig. > Firstly, I am sorry but I am not intelligent enough to perceive, how do the user decide which instance of Config to use? I personally think that there is no API to achieve what you just mentioned :(; i.e. the user will have know CassandraConfig directly. Secondly, I instead was thinking of porting JGit for that matter to any system supporting streams (not any specific sub-class of them), such HBase/BigTable or HDFS anything.... Thirdly, I think we actually have several task in hand and I would state them as - 1. First introduce the I/O API such that it completely replaces java.io.File 2. Secondly segregate persistence of for config (or config like objects) and introduce a SPI for them for smarter storage. I am not thinking of storing "only" the bare content of a git repository, but I intent to be able to also store the versioned contents it self as well. If we choose the above 2 steps I mentioned I am of the opinion that we will be able to achieve both our ideas. In addition I hope that if one day Git itself introduces a similar I/O API then Git can also support the I/O SPI implementations JGit will. Waiting eagerly to learn what you think :). Best regards, Imran > For RefDatabase, we'd want to do the same and avoid the concept of > packed-refs altogether. Each Ref should go into its own row in a > Cassandra storage system, and essentially act as a loose object. > Ditto with JDBC. > > We'd probably never need to read-or-write the info/refs or > objects/info/packs listings. > > And I think that's everything that a bare repository needs, aside > from ObjectDatabase, which is already mostly abstract anyway. > > -- > Shawn. > -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: imran@xxxxxxxxxxxxxxxxxxxxxx Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html