Re: [NewStore]About PGLog Workload With RocksDB

Haomai Wang <haomaiwang@xxxxxxxxx> · Tue, 8 Sep 2015 22:18:18 +0800

On Tue, Sep 8, 2015 at 10:12 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> On Tue, Sep 8, 2015 at 3:06 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>> Hit "Send" by accident for previous mail. :-(
>>
>> some points about pglog:
>> 1. short-alive but frequency(HIGH)
>
> Is this really true? The default length of the log is 1000 entries,
> and most OSDs have ~100 PGs, so on a hard drive running at 80
> writes/second that's about 100000 seconds (~27 hours) before we delete

SSD is filled in my mind....... Yep, for HDD pglogs it's not a passing
traveller.

The main point I think is pglog, journal data and omap keys are three
types data.

> an entry. In reality most deployments aren't writing that
> quickly....and if something goes wrong with the PG we increase to
> 10000 log entries!
> -Greg
>
>> 2. small and related to the number of pgs
>> 3. typical seq read/write scene
>> 4. doesn't need rich structure like LSM or B-tree to support apis, has
>> obvious different to user-side/other omap keys.
>> 5. a simple loopback impl is efficient and simple
>>
>>
>> On Tue, Sep 8, 2015 at 9:58 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>>> Hi Sage,
>>>
>>> I notice your post in rocksdb page about make rocksdb aware of short
>>> alive key/value pairs.
>>>
>>> I think it would be great if one keyvalue db impl could support
>>> different key types with different store behaviors. But it looks like
>>> difficult for me to add this feature to an existing db.
>>>
>>> So combine my experience with filestore, I just think let
>>> NewStore/FileStore aware of this short-alive keys(Or just PGLog keys)
>>> could be easy and effective. PGLog owned by PG and maintain the
>>> history of ops. It's alike Journal Data but only have several hundreds
>>> bytes. Actually we only need to have several hundreds MB at most to
>>> store all pgs pglog. For FileStore, we already have FileJournal have a
>>> copy of PGLog, previously I always think about reduce another copy in
>>> leveldb to reduce leveldb calls which consumes lots of cpu cycles. But
>>> it need a lot of works to be done in FileJournal to aware of pglog
>>> things. NewStore doesn't use FileJournal and it should be easier to
>>> settle down my idea(?).
>>>
>>> Actually I think a rados write op in current objectstore impl that
>>> omap key/value pairs hurts performance hugely. Lots of cpu cycles are
>>> consumed and contributes to short-alive keys(pglog). It should be a
>>> obvious optimization point. In the other hands, pglog is dull and
>>> doesn't need rich keyvalue api supports. Maybe a lightweight
>>> filejournal to settle down pglogs keys is also worth to try.
>>>
>>> In short, I think it would be cleaner and easier than improving
>>> rocksdb to impl a pglog-optimization structure to store this.
>>>
>>> PS(off topic): a keyvaluedb benchmark http://sphia.org/benchmarks.html
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html