Agree this needs tidied up in the docs. New users have little chance of
getting it right relying on the docs alone. It's been discussed at
length here several times in various threads but it doesn't always seem
we reach the same conclusion so reading here doesn't guarantee
understanding this correctly either as I'm no doubt about to demonstrate :)
Mattia Belluco said back in May:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035086.html
"when RocksDB needs to compact a layer it rewrites it
*before* deleting the old data; if you'd like to be sure you db does not
spill over to the spindle you should allocate twice the size of the
biggest layer to allow for compaction."
I didn't spot anyone disagreeing so I used 64GiB DB/WAL partitions on
the SSDs in my most recent clusters to allow for this and to be certain
that I definitely had room for the WAL on top and wouldn't get caught
out by people saying GB (x1000^3 bytes) when they mean GiB (x1024^3
bytes). I left the rest of the SSD empty to make the most of wear
leveling, garbage collection etc.
Simon
On 26/11/2019 12:20, Janne Johansson wrote:
It's mentioned here among other places
https://books.google.se/books?id=vuiLDwAAQBAJ&pg=PA79&lpg=PA79&dq=rocksdb+sizes+3+30+300+g&source=bl&ots=TlH4GR0E8P&sig=ACfU3U0QOJQZ05POZL9DQFBVwTapML81Ew&hl=en&sa=X&ved=2ahUKEwiPscq57YfmAhVkwosKHY1bB1YQ6AEwAnoECAoQAQ#v=onepage&q=rocksdb%20sizes%203%2030%20300%20g&f=false
The 4% was a quick ballpark figure someone came up with to give early
adopters a decent start, but later science has shown that L0,L1,L2
levels make the sizes 3,30,300 "optimal" to not waste SSD space that
will not be used.
You can set 240, but it will not be better than 30. It will be better
than 24, so "not super bad, but not optimal".
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com