Re: Storage usage of CephFS-MDS

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Mon, 26 Feb 2018 12:01:35 +0100

Dear Cephalopodians,

I have to extend my question a bit - in our system with 105,000,000 objects in CephFS (mostly stabilized now after the stress-testing...),
I observe the following data distribution for the metadata pool:
# ceph osd df | head
ID  CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE  VAR  PGS 
  0   ssd 0.21829  1.00000  223G  9927M  213G  4.34 0.79   0 
  1   ssd 0.21829  1.00000  223G  9928M  213G  4.34 0.79   0 
  2   ssd 0.21819  1.00000  223G 77179M  148G 33.73 6.11 128 
  3   ssd 0.21819  1.00000  223G 76981M  148G 33.64 6.10 128

osd.0 - osd.3 are all exclusively meant for cephfs-metadata, currently we use 4 replicas with failure domain OSD there. 
I have reinstalled and reformatted osd.0 and osd.1 about 36 hours ago. 

All 128 PGs in the metadata pool are backfilling (I have increased osd-max-backfills temporarily to speed things up for those OSDs). 
However, they only managed to backfill < 10 GB in those 36 hours. I have not touched any other of the default settings concerning backfill
or recovery (but these are SSDs, so sleeps should be 0). 
The backfilling seems not to be limited by CPU, nor network, not disks. 
"ceph -s" confirms a backfill performance of about 60-100 keys/s. 
This metadata, as written before, is almost exclusively RocksDB:

    "bluefs": {
        "gift_bytes": 0,
        "reclaim_bytes": 0,
        "db_total_bytes": 84760592384,
        "db_used_bytes": 77289488384,

is it normal that this kind of backfilling is so horrendously slow? Is there a way to speed it up? 
Like this, it will take almost two weeks for 77 GB of (meta)data. 
Right now, the system is still in the testing phase, but we'd of course like to be able to add more MDS's and SSD's later without extensive backfilling periods. 

Cheers,
	Oliver

Am 25.02.2018 um 19:26 schrieb Oliver Freyermuth:
> Dear Cephalopodians,
> 
> as part of our stress test with 100,000,000 objects (all small files) we ended up with
> the following usage on the OSDs on which the metadata pool lives:
> # ceph osd df | head
> ID  CLASS WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE  VAR  PGS 
> [...]
>   2   ssd 0.21819  1.00000  223G 79649M  145G 34.81 6.62 128 
>   3   ssd 0.21819  1.00000  223G 79697M  145G 34.83 6.63 128
> 
> The cephfs-data cluster is mostly empty (5 % usage), but contains 100,000,000 small objects. 
> 
> Looking with:
> ceph daemon osd.2 perf dump
> I get:
>     "bluefs": {
>         "gift_bytes": 0,
>         "reclaim_bytes": 0,
>         "db_total_bytes": 84760592384,
>         "db_used_bytes": 78920024064,
>         "wal_total_bytes": 0,
>         "wal_used_bytes": 0,
>         "slow_total_bytes": 0,
>         "slow_used_bytes": 0,
> so it seems this is almost exclusively RocksDB usage. 
> 
> Is this expected? 
> Is there a recommendation on how much MDS storage is needed for a CephFS with 450 TB? 
> 
> Cheers,
> 	Oliver
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com