Re: storing pg logs outside of rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/28/2018 12:21 PM, Adam C. Emerson wrote:

On 28/03/2018, Varada Kari wrote:
Agree. I like the approaches. Like first approach, we could manage the
space as a virtual container and keep them growing in case someone
wants to have a bigger trim window.

Wanted to check, instead of level compaction, what would be impact of
universal compaction? we would consume more space, but we can keep all
of the entries in L0 files. For SSD backends we might observe some
respite on the write amplification, but there could be more space
amplification.
How are we planning to expose this? Are we going to add 'PGLog'
management functions to the object store interface?

I would /really really rather not/ try to have behind the scenes magic
where BlueStore intercepts certain omap calls and does something
hidden and arcane to them. Since we're going to have other stores in
the future I'd like to make sure whatever we have is explicit and easy
to adapt and use.

I sort of have semi-competing thoughts:

1) Maybe it makes sense that rocksdb should be able to determine that a given key is short lived and shouldn't make it into L0 at all but you still want to batch it in with a transaction to the WAL and archive the whole log as-is until tombstones for all remaining log entries are encountered.  Basically the idea that I mentioned in the other reply.  This arguably goes beyond Ceph and is more about how RocksDB treats short lived data.  Our design more or less remains the same except that we tell rocksdb that some classes of keys are short lived (assuming that functionality could be added to rocksdb).

2) It sure feels like conceptually the pglog should be represented as a per-pg ring buffer rather than key/value data.  Maybe there are really important reasons that it shouldn't be, but I don't currently see them.  As far as the objectstore is concerned, it seems to me like there are valid reasons to provide some kind of log interface and perhaps that should be used for pg_log.  That sort of opens the door for different object store implementations fulfilling that functionality in whatever ways the author deems fit.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux