On 03/28/2018 12:21 PM, Adam C. Emerson wrote:
On 28/03/2018, Varada Kari wrote:
Agree. I like the approaches. Like first approach, we could manage the
space as a virtual container and keep them growing in case someone
wants to have a bigger trim window.
Wanted to check, instead of level compaction, what would be impact of
universal compaction? we would consume more space, but we can keep all
of the entries in L0 files. For SSD backends we might observe some
respite on the write amplification, but there could be more space
amplification.
How are we planning to expose this? Are we going to add 'PGLog'
management functions to the object store interface?
I would /really really rather not/ try to have behind the scenes magic
where BlueStore intercepts certain omap calls and does something
hidden and arcane to them. Since we're going to have other stores in
the future I'd like to make sure whatever we have is explicit and easy
to adapt and use.
I sort of have semi-competing thoughts:
1) Maybe it makes sense that rocksdb should be able to determine that a
given key is short lived and shouldn't make it into L0 at all but you
still want to batch it in with a transaction to the WAL and archive the
whole log as-is until tombstones for all remaining log entries are
encountered. Basically the idea that I mentioned in the other reply.
This arguably goes beyond Ceph and is more about how RocksDB treats
short lived data. Our design more or less remains the same except that
we tell rocksdb that some classes of keys are short lived (assuming that
functionality could be added to rocksdb).
2) It sure feels like conceptually the pglog should be represented as a
per-pg ring buffer rather than key/value data. Maybe there are really
important reasons that it shouldn't be, but I don't currently see them.
As far as the objectstore is concerned, it seems to me like there are
valid reasons to provide some kind of log interface and perhaps that
should be used for pg_log. That sort of opens the door for different
object store implementations fulfilling that functionality in whatever
ways the author deems fit.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html