On 03/28/2018 12:59 PM, Adam C. Emerson wrote:
On 28/03/2018, Mark Nelson wrote:
I sort of have semi-competing thoughts:
1) Maybe it makes sense that rocksdb should be able to determine that a
given key is short lived and shouldn't make it into L0 at all but you still
want to batch it in with a transaction to the WAL and archive the whole log
as-is until tombstones for all remaining log entries are encountered.
Basically the idea that I mentioned in the other reply. This arguably goes
beyond Ceph and is more about how RocksDB treats short lived data. Our
design more or less remains the same except that we tell rocksdb that some
classes of keys are short lived (assuming that functionality could be added
to rocksdb).
2) It sure feels like conceptually the pglog should be represented as a
per-pg ring buffer rather than key/value data. Maybe there are really
important reasons that it shouldn't be, but I don't currently see them. As
far as the objectstore is concerned, it seems to me like there are valid
reasons to provide some kind of log interface and perhaps that should be
used for pg_log. That sort of opens the door for different object store
implementations fulfilling that functionality in whatever ways the author
deems fit.
Of these two competing thoughts, I firmly believe that Thought 2
should kill and eat Thought 1. Given that SeaStore or whatever that
will be NVMe optimized won't even use RocksDB, we definitely don't
want to depend on RocksDB behavior in the long term.
Also I'm with you that it makes sense, intuitively, if we have some
concept of 'log' that the ObjectStore is responsible for keeping track
of and make it explicit.
I'm not sure they have to be mutually exclusive. It could be that we
implement the obectstore log interface and bluestore more or less just
sends stuff to rocksdb anyway to piggyback the log write on the
transaction and avoid the extra seek. I guess the point though would be
that everything becomes more generic and the objecstore handles the
logginess.
I guess the thing that scares me the most is that my eyes glaze over
when I look at the pg log code and I'm super afraid to touch it. Maybe
someone else will be less scared of it? :D
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html