Hi Wido & Hermant.
On 8/14/2019 11:36 AM, Wido den Hollander wrote:
On 8/14/19 9:33 AM, Hemant Sonawane wrote:
Hello guys,
Thank you so much for your responses really appreciate it. But I would
like to mention one more thing which I forgot in my last email is that I
am going to use this storage for openstack VM's. So still the answer
will be the same that I should use 1GB for wal?
WAL 1GB is fine, yes.
I'd like to argue against this for a bit.
Actually standalone WAL is required when you have either very small fast
device (and don't want db to use it) or three devices (different in
performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located
at the fastest one.
For the given use case you just have HDD and NVMe and DB and WAL can
safely collocate. Which means you don't need to allocate specific volume
for WAL. Hence no need to answer the question how many space is needed
for WAL. Simply allocate DB and WAL will appear there automatically.
As this is an OpenStack/RBD only use-case I would say that 10GB of DB
per 1TB of disk storage is sufficient.
Given RocksDB granularity already mentioned in this thread we tend to
prefer some fixed allocation sizes with 30-60Gb being close to the optimal.
Anyway suggest to use LVM for DB/WAL volume and may be start with
smaller size (e.g. 32GB per OSD) which leaves some extra spare space on
your NVMes and allows to add more space if needed. (Just to note -
removing some already allocated but still unused space from existing OSD
and gift it to another/new OSD is a more troublesome task than adding
some space from the spare volume).
On Wed, 14 Aug 2019 at 05:54, Mark Nelson <mnelson@xxxxxxxxxx
<mailto:mnelson@xxxxxxxxxx>> wrote:
On 8/13/19 3:51 PM, Paul Emmerich wrote:
> On Tue, Aug 13, 2019 at 10:04 PM Wido den Hollander <wido@xxxxxxxx
<mailto:wido@xxxxxxxx>> wrote:
>> I just checked an RGW-only setup. 6TB drive, 58% full, 11.2GB of
DB in
>> use. No slow db in use.
> random rgw-only setup here: 12TB drive, 77% full, 48GB metadata and
> 10GB omap for index and whatever.
>
> That's 0.5% + 0.1%. And that's a setup that's using mostly erasure
> coding and small-ish objects.
>
>
>> I've talked with many people from the community and I don't see an
>> agreement for the 4% rule.
> agreed, 4% isn't a reasonable default.
> I've seen setups with even 10% metadata usage, but these are weird
> edge cases with very small objects on NVMe-only setups (obviously
> without a separate DB device).
>
> Paul
I agree, and I did quite a bit of the early space usage analysis. I
have a feeling that someone was trying to be well-meaning and make a
simple ratio for users to target that was big enough to handle the
majority of use cases. The problem is that reality isn't that simple
and one-size-fits all doesn't really work here.
For RBD you can usually get away with far less than 4%. A small
fraction of that is often sufficient. For tiny (say 4K) RGW objects
(especially objects with very long names!) you potentially can end up
using significantly more than 4%. Unfortunately there's no really good
way for us to normalize this so long as RGW is using OMAP to store
bucket indexes. I think the best we can do long run is make it much
clearer how space is being used on the block/db/wal devices and easier
for users to shrink/grow the amount of "fast" disk they have on an OSD.
Alternately we could put bucket indexes into rados objects instead of
OMAP, but that would be a pretty big project (with it's own challenges
but potentially also with rewards).
Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Thanks and Regards,
Hemant Sonawane
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com