On Thu, Mar 29, 2018 at 1:44 AM, Mark Nelson <mnelson@xxxxxxxxxx> wrote: > On 03/28/2018 12:21 PM, Adam C. Emerson wrote: > >> On 28/03/2018, Varada Kari wrote: >>> >>> Agree. I like the approaches. Like first approach, we could manage the >>> space as a virtual container and keep them growing in case someone >>> wants to have a bigger trim window. >>> >>> Wanted to check, instead of level compaction, what would be impact of >>> universal compaction? we would consume more space, but we can keep all >>> of the entries in L0 files. For SSD backends we might observe some >>> respite on the write amplification, but there could be more space >>> amplification. >> >> How are we planning to expose this? Are we going to add 'PGLog' >> management functions to the object store interface? >> >> I would /really really rather not/ try to have behind the scenes magic >> where BlueStore intercepts certain omap calls and does something >> hidden and arcane to them. Since we're going to have other stores in >> the future I'd like to make sure whatever we have is explicit and easy >> to adapt and use. > > > I sort of have semi-competing thoughts: > > 1) Maybe it makes sense that rocksdb should be able to determine that a > given key is short lived and shouldn't make it into L0 at all but you still > want to batch it in with a transaction to the WAL and archive the whole log > as-is until tombstones for all remaining log entries are encountered. > Basically the idea that I mentioned in the other reply. This arguably goes > beyond Ceph and is more about how RocksDB treats short lived data. Our > design more or less remains the same except that we tell rocksdb that some > classes of keys are short lived (assuming that functionality could be added > to rocksdb). > > 2) It sure feels like conceptually the pglog should be represented as a > per-pg ring buffer rather than key/value data. Maybe there are really > important reasons that it shouldn't be, but I don't currently see them. As > far as the objectstore is concerned, it seems to me like there are valid > reasons to provide some kind of log interface and perhaps that should be > used for pg_log. That sort of opens the door for different object store > implementations fulfilling that functionality in whatever ways the author > deems fit. Would like to get more clean. :) We put pglog in per-pg ring buffer. Does it mean to support ring buffer in RocksDB? If not, where to store other metadata like onodes etc? -- Best wishes Lisa -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html