Re: storing pg logs outside of rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 29, 2018 at 1:44 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> On 03/28/2018 12:21 PM, Adam C. Emerson wrote:
>
>> On 28/03/2018, Varada Kari wrote:
>>>
>>> Agree. I like the approaches. Like first approach, we could manage the
>>> space as a virtual container and keep them growing in case someone
>>> wants to have a bigger trim window.
>>>
>>> Wanted to check, instead of level compaction, what would be impact of
>>> universal compaction? we would consume more space, but we can keep all
>>> of the entries in L0 files. For SSD backends we might observe some
>>> respite on the write amplification, but there could be more space
>>> amplification.
>>
>> How are we planning to expose this? Are we going to add 'PGLog'
>> management functions to the object store interface?
>>
>> I would /really really rather not/ try to have behind the scenes magic
>> where BlueStore intercepts certain omap calls and does something
>> hidden and arcane to them. Since we're going to have other stores in
>> the future I'd like to make sure whatever we have is explicit and easy
>> to adapt and use.
>
>
> I sort of have semi-competing thoughts:
>
> 1) Maybe it makes sense that rocksdb should be able to determine that a
> given key is short lived and shouldn't make it into L0 at all but you still
> want to batch it in with a transaction to the WAL and archive the whole log
> as-is until tombstones for all remaining log entries are encountered.
> Basically the idea that I mentioned in the other reply.  This arguably goes
> beyond Ceph and is more about how RocksDB treats short lived data.  Our
> design more or less remains the same except that we tell rocksdb that some
> classes of keys are short lived (assuming that functionality could be added
> to rocksdb).

>
> 2) It sure feels like conceptually the pglog should be represented as a
> per-pg ring buffer rather than key/value data.  Maybe there are really
> important reasons that it shouldn't be, but I don't currently see them.  As
> far as the objectstore is concerned, it seems to me like there are valid
> reasons to provide some kind of log interface and perhaps that should be
> used for pg_log.  That sort of opens the door for different object store
> implementations fulfilling that functionality in whatever ways the author
> deems fit.
Would like to get more clean. :) We put pglog in per-pg ring buffer.
Does it mean to support ring buffer in RocksDB?  If not, where to
store other metadata like onodes etc?


-- 
Best wishes
Lisa
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux