Re: DB sizing for lots of large files

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 28 Nov 2020 18:59:09 -0800

Christian wrote “post Octopus”.  The referenced code seems likely to appear in Pacific.  We’ll see how it works out in practice.

I suspect that provisioned space will automagically be used when an OSD starts under a future release, though the release notes may give us specific instructions, like we saw with the new stats reporting.

> So the 3/30/300GB restriction no longer exists in Octopus, so can I make it
> 10GB and it will use all 10GB?
> 
> Is there a migration strategy that allows me to setup the DB on the OSD,
> see how much metadata my 25TB is using, make a partition on the Optane say
> quadruple the size and then move the DB to the Optane?
> 
> Or maybe the best strategy would be to start with a small logical volume on
> the Optane, copy over my 25TB of existing data and extend it if required?
> 
> The bluefs-bdev-migrate and bluefs-bdev-expand commnds seem to be the
> ticket.
> 
> 
> 
> On 27 Nov 2020 at 6:19:06 am, Christian Wuerdig <christian.wuerdig@xxxxxxxxx>
> wrote:
> 
>> Sorry, I replied to the wrong email thread before, so reposting this:
>> I think it's time to start pointing out the the 3/30/300 logic not really
>> holds any longer true post Octopus:
>> 
>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/CKRCB3HUR7UDRLHQGC7XXZPWCWNJSBNT/
>> Although I suppose in a way this makes it even harder to provide a sizing
>> recommendation
>> 
>> On Fri, 27 Nov 2020 at 04:49, Burkhard Linke <
>> Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> 
>> On 11/26/20 12:45 PM, Richard Thornton wrote:
>> 
>>> Hi,
>> 
>>> 
>> 
>>> Sorry to bother you all.
>> 
>>> 
>> 
>>> It’s a home server setup.
>> 
>>> 
>> 
>>> Three nodes (ODROID-H2+ with 32GB RAM and dual 2.5Gbit NICs), two 14TB
>> 
>>> 7200rpm SATA drives and an Optane 118GB NVMe in each node (OS boots from
>> 
>>> eMMC).
>> 
>> 
>> 
>> *snipsnap*
>> 
>> 
>>> Is there a rough CephFS calculation (each file uses x bytes of
>> 
>> metadata), I
>> 
>>> think I should be safe with 30GB, now I read I should double that (you
>> 
>>> should allocate twice the size of the biggest layer to allow for
>> 
>>> compaction) but I only have 118GB and two OSDs so I will have to go for
>> 
>>> 59GB (or whatever will fit)?
>> 
>> 
>> The recommended size of 30 GB is due to the level design of rocksdb;
>> 
>> data is stored in different cache levels with increasing level sizes. 30
>> 
>> GB is a kind of sweet spot between 3 GB and 300 GB (too small / way too
>> 
>> large for most use case). The recommendation for doubling the size for
>> 
>> compaction is OK, but you will waste capacity most the time.
>> 
>> 
>> In our cephfs instance we have ~ 115.000.000 files. Metadata is stored
>> 
>> on 18 SSD based OSDs. About 30-35 GB raw capacity of the data is
>> 
>> currently in use, almost exclusively for metadata, omap and other stuff.
>> 
>> You might be able to scale this down to your use case. Our average file
>> 
>> size approx. 5 MB, so you can also put a little bit on top in your case.
>> 
>> 
>> If your working set (files accesses in a time span) is rather small, you
>> 
>> also have the option to use the SSD for some block device caching layer
>> 
>> like bcache or dmcache. In this setup the whole capacity will be used,
>> 
>> and also data operations on the OSDs will benefit from the faster SSDs.
>> 
>> Your failure domain will be the same; if the SSD dies your data disks
>> 
>> will be useless.
>> 
>> 
>> Otherwise I would recommend to use DB partitions of the recommended size
>> 
>> (do not forget to include some extra space for the WAL), and use the
>> 
>> remaining capacity for extra SSD based OSDs similar to our setup. This
>> 
>> willensure that metadata access will be fast[tm].
>> 
>> 
>> 
>> Regards,
>> 
>> 
>> Burkhard
>> 
>> 
>> _______________________________________________
>> 
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> 
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx