Re: tabled test corpus?

Colin McCabe <cmccabe@xxxxxxxxxxxxxx> · Fri, 5 Mar 2010 13:15:54 -0800

Random thoughts:

Maybe something like a freely available dictionary would work, with
the key as the word, and the value as the definition.

You could grab git commits from the Linux kernel and make the key the
SHA, and the value the patch.

There's a lot of text in Project Gutenberg. I guess you'd have to
decide what you want your average key / value lengths to be-- I think
most books there are longer than 16K. Maybe you could make the key
(book, page_number).

Colin

P.S. I've been meaning to set up a bigger tabled installation myself,
as soon as I get some time.

On Fri, Mar 5, 2010 at 10:33 AM, Jeff Garzik <jeff@xxxxxxxxxx> wrote:
> On 03/05/2010 10:31 AM, Jeff Garzik wrote:
>>
>> Can anybody suggest a good test dataset for tabled?
>>
>> Hopefully something with a million or more keys, where the values are
>> large.
>>
>> I can certainly generate something like that artificially, but a
>> real-world dataset would be nice.
>
> Still looking for a good, real-world data set.
>
> A synthetic store+retrieve test of 1m keys @ 16K values worked without a
> hitch.  I documented this on
> http://hail.wiki.kernel.org/index.php/Extended_status
>
>        Jeff
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe hail-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html