Re: WAL/DB size

Igor Fedotov <ifedotov@xxxxxxx> · Wed, 14 Aug 2019 14:16:07 +0300

Hi Wido & Hermant.

On 8/14/2019 11:36 AM, Wido den Hollander wrote:

On 8/14/19 9:33 AM, Hemant Sonawane wrote:
Hello guys,

Thank you so much for your responses really appreciate it. But I would
like to mention one more thing which I forgot in my last email is that I
am going to use this storage for openstack VM's. So still the answer
will be the same that I should use 1GB for wal?

WAL 1GB is fine, yes.

I'd like to argue against this for a bit.

Actually standalone WAL is required when you have either very small fast 
device (and don't want db to use it) or three devices (different in 
performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located  
at the fastest one.

For the given use case you just have HDD and NVMe and DB and WAL can 
safely collocate. Which means you don't need to allocate specific volume 
for WAL. Hence no need to answer the question how many space is needed 
for WAL. Simply allocate DB and WAL will appear there automatically.

As this is an OpenStack/RBD only use-case I would say that 10GB of DB
per 1TB of disk storage is sufficient.

Given RocksDB granularity already mentioned in this thread we tend to 
prefer some fixed allocation sizes with 30-60Gb being close to the optimal.

Anyway suggest to use LVM for DB/WAL volume and may be start with 
smaller size (e.g. 32GB per OSD) which leaves some extra spare space on 
your NVMes and allows to add more space if needed. (Just to note - 
removing some already allocated but still unused space from existing OSD 
and gift it to another/new OSD is a more troublesome task than adding 
some space from the spare volume).

On Wed, 14 Aug 2019 at 05:54, Mark Nelson <mnelson@xxxxxxxxxx
<mailto:mnelson@xxxxxxxxxx>> wrote:

     On 8/13/19 3:51 PM, Paul Emmerich wrote:

     > On Tue, Aug 13, 2019 at 10:04 PM Wido den Hollander <wido@xxxxxxxx
     <mailto:wido@xxxxxxxx>> wrote:
     >> I just checked an RGW-only setup. 6TB drive, 58% full, 11.2GB of
     DB in
     >> use. No slow db in use.
     > random rgw-only setup here: 12TB drive, 77% full, 48GB metadata and
     > 10GB omap for index and whatever.
     >
     > That's 0.5% + 0.1%. And that's a setup that's using mostly erasure
     > coding and small-ish objects.
     >
     >
     >> I've talked with many people from the community and I don't see an
     >> agreement for the 4% rule.
     > agreed, 4% isn't a reasonable default.
     > I've seen setups with even 10% metadata usage, but these are weird
     > edge cases with very small objects on NVMe-only setups (obviously
     > without a separate DB device).
     >
     > Paul

     I agree, and I did quite a bit of the early space usage analysis.  I
     have a feeling that someone was trying to be well-meaning and make a
     simple ratio for users to target that was big enough to handle the
     majority of use cases.  The problem is that reality isn't that simple
     and one-size-fits all doesn't really work here.

     For RBD you can usually get away with far less than 4%.  A small
     fraction of that is often sufficient.  For tiny (say 4K) RGW objects
     (especially objects with very long names!) you potentially can end up
     using significantly more than 4%. Unfortunately there's no really good
     way for us to normalize this so long as RGW is using OMAP to store
     bucket indexes.  I think the best we can do long run is make it much
     clearer how space is being used on the block/db/wal devices and easier
     for users to shrink/grow the amount of "fast" disk they have on an OSD.
     Alternately we could put bucket indexes into rados objects instead of
     OMAP, but that would be a pretty big project (with it's own challenges
     but potentially also with rewards).

     Mark

     _______________________________________________
     ceph-users mailing list
     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Thanks and Regards,

Hemant Sonawane

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com