Re: reftable [v4]: new ref storage format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 1, 2017 at 1:23 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Mon, Jul 31, 2017 at 11:41 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> On Sun, Jul 30, 2017 at 8:51 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>>> 4th iteration of the reftable storage format.
>>> [...]
>>
>> Before we commit to Shawn's reftable proposal, I wanted to explore
>> what a contrasting design that is not block based would look like. So
>> I threw together a sketch of such a design. It is not as refined as
>> Shawn's, and I haven't had time to program an implementation or
>> collect statistics, but maybe it is interesting anyway.
>
> Thanks for taking the time to write this out. Its constructive to see
> other possible approaches.
>
> I roughly implemented your proposed design in JGit for references
> only. I skipped the object lookup and log handling for the sake of
> studying the basics of this approach.

Wow, that's awesome, thanks!

>        | size   | seek_cold | seek_hot  |
> mh     | 28.3 M | 24.5 usec | 14.5 usec |
> sp  4k | 29.2 M | 63.0 usec |  5.8 usec |
> sp 64k | 27.7 M | 35.6 usec | 23.3 usec |

OK, so it's at least in the right ballpark.

>> * Multiple related chunks can be stored in a 64 kiB block (interesting
>>   for people who prefer to read files in 64k segments). By storing
>>   related chunks near each other, the number of seeks will probably
>>   typically be smaller than the number of chunks that have to be
>>   accessed.
>
> This is difficult. The most related chunks are actually index chunks
> in the multi-level index. You want the next step(s) near the current
> step to avoid an actual disk seek. But that is hard to do when the
> index is several levels deep.

Given the 64k read size that you prefer, it seems to me that it would
probably be possible to keep pairs of index layers (i.e., one index
node and all of its direct children) in the same 64k range, though
this might require adjusting their sizes somewhat. And possibly to
keep the lowest index layer close to the reference chunks that it
describes (analogous to your format, where the restart records form
something like an index that is located close to the references). So I
would think that it is plausible to arrange things so that a reference
can be read within two 64k reads even if the index has three levels.

>> * The file can be written in two possible ways: with cross-references
>>   between chunks always pointing to later parts of the file (probably
>>   preferable if the data will be read from a local hard disk) or
>>   always pointing to earlier parts of the file (to accomodate writers
>>   who need to write their data in order). See the `offsets_negative`
>>   field in the header. (I'm not sure that the former variant is
>>   actually needed.)
>
> This seems like unnecessary complexity. I'd prefer to just say the
> offsets are negative, and reference backwards.

Maybe. I just wasn't sure whether a hard disk would perform
significantly worse when reading backwards rather than forwards
(because of pre-fetch of blocks). It could very well be unnecessary.

Michael



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux