Re: Proposed design of fast-export helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi again,

Ramkumar Ramachandra wrote:

> Next step: We should find out all the things <dataref> can currently
> be, by looking at existing frontend implementation.

There are only a few places a <dataref> can come from.

 * marks (":5"-style datarefs and from the marks file)
 * output from the "ls" or "cat" command
 * out-of-band knowledge by the frontend about this specific backend
   (e.g., "I know git fast-import supports git-style blob names, so I
   will use them").

> Then, we should
> come tighten the spec so that it doesn't clobber any of those things.
> Also, we should find a way to let the backend know "how" to index/
> retrieve a blob -- this is only straightforward in the case of marks.

If this key/value store is specifically for use with fast-import
backends, I'd prefer it just deal with marks.  Caching responses to
lookup of blobs using a backend-specific <dataref> format is a
different problem.

>> I assume the delimited format works as in fast-import's "data" command
>> (and only supports blobs ending with LF)?
>
> Yes.  This is actually quite an ugly to support -- We should probably
> drop support for this.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@xxxxxxxxx>
>
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index 2c2ea12..1fb71f7 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -826,8 +826,8 @@ of the next line, even if `<raw>` did not end with an `LF`.
>  Delimited format::
>  	A delimiter string is used to mark the end of the data.
>  	fast-import will compute the length by searching for the delimiter.
> -	This format is primarily useful for testing and is not
> -	recommended for real data.
> +	This format is should only be used for testing; other
> +	backends are not required to support this.

That would mean git's fast-import test suite couldn't be used with
arbitrary fast-import backends.  The delimited format is also very
convenient for testing "by hand".  I'm not convinced it's that hard to
support.

> Performance and portability considerations.  Calling semantics will
> probably be highly inelegant too, since full-blown bi-directional
> communication is necessary.

Takes command on stdin, writes response to stdout.  Seems kind of
typical for helper programs, but you are right to mention that shell
scripts do not deal so well with that when the helper process needs
to handle multiple requests.

e.g., see /usr/share/debconf/confmodule.

> Jonathan Nieder writes:

>> 	FILE *f = kvstore_fopen(key, O_WRONLY);
>> 	fwrite(value, sz, 1, f);
>> 	kvstore_fclose(f);
>> 
>> 	FILE *f = kvstore_fopen(key, O_RDONLY);
>> 	strbuf_fread(&value, SIZE_MAX, f);
>> 	kvstore_fclose(f);
>
> I don't like this.  The caller should not have to know about whether
> blobs are persisted in-memory or on-disk.  When there are a few small
> frequently-used blobs, the key-value might decide to persist them in
> memory, and we should allow for this kind of optimization.

A FILE * can have a pipe underlying it, or a memory area (with
fmemopen).  But I agree that using FILE * here would be a bad API
after all, for other reasons (the FILE * operations are just not quite
right).

>> Is there prior art that this could mimic or reuse (so we can learn
>> from others' mistakes and make sure the API feels familiar)?
>
> Kyoto Cabinet

Yikes, its API is complicated. :)  Does Kyoto Cabinet support values
longer than a size_t can describe?  Maybe that's not worth supporting
after all (I guess it would be nice to have an example of a 4 GiB file
in version control to motivate it first).

> Obviously, this will be less efficient than a store which keys
> everything using a fixed 20-byte SHA1

Sure, sticking to fixed-length, 20-byte keys could be reasonable if
that's what the application requires.  Is this for caching or for
avoiding an internal marks table?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]