cephfs full, 2/3 Raw capacity used

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Mon, 26 Aug 2019 11:08:56 +0200

Hi all,

we're building up our experience with our ceph cluster before we take it 
into production. I've now tried to fill up the cluster with cephfs, 
which we plan to use for about 95% of all data on the cluster.

The cephfs pools are full when the cluster reports 67% raw capacity 
used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3 
and EC 5+7. The balancer module is turned on and `ceph balancer eval` 
gives `current cluster score 0.013255 (lower is better)`, so well within 
the default 5% margin. Is there a setting we can tweak to increase the 
usable RAW capacity to say 85% or 90%, or is this the most we can expect 
to store on the cluster?

[root@cephmon1 ~]# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       1.8 PiB     605 TiB     1.2 PiB      1.2 PiB         66.71
    TOTAL     1.8 PiB     605 TiB     1.2 PiB      1.2 PiB         66.71

POOLS:
    POOL                    ID     STORED      OBJECTS     USED 
%USED      MAX AVAIL
    cephfs_data              1     111 MiB      79.26M     1.2 GiB 
100.00           0 B
    cephfs_metadata          2      52 GiB       4.91M      52 GiB 
100.00           0 B
    cephfs_data_4copy        3     106 TiB      46.36M     428 TiB 
100.00           0 B
    cephfs_data_3copy        8      93 TiB      42.08M     282 TiB 
100.00           0 B
    cephfs_data_ec83        13     106 TiB      50.11M     161 TiB 
100.00           0 B
    rbd                     14      21 GiB       5.62k      63 GiB 
100.00           0 B
    .rgw.root               15     1.2 KiB           4       1 MiB 
100.00           0 B
    default.rgw.control     16         0 B           8         0 B 
    0           0 B
    default.rgw.meta        17       765 B           4       1 MiB 
100.00           0 B
    default.rgw.log         18         0 B         207         0 B 
    0           0 B
    scbench                 19     133 GiB      34.14k     400 GiB 
100.00           0 B
    cephfs_data_ec57        20     126 TiB      51.84M     320 TiB 
100.00           0 B
[root@cephmon1 ~]# ceph balancer eval
current cluster score 0.013255 (lower is better)

Being full at 2/3 Raw used is a bit too "pretty" to be accidental, it 
seems like this could be a parameter for cephfs, however, I couldn't 
find anything like this in the documentation for Nautilus.

The logs in the dashboard show this:
2019-08-26 11:00:00.000630
[ERR]
overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s) full

2019-08-26 10:57:44.539964
[INF]
Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s) backfillfull)

2019-08-26 10:57:44.539944
[WRN]
Health check failed: 12 pool(s) full (POOL_FULL)

2019-08-26 10:57:44.539926
[ERR]
Health check failed: 1 full osd(s) (OSD_FULL)

2019-08-26 10:57:44.539899
[WRN]
Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL)

2019-08-26 10:00:00.000088
[WRN]
overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull

So it seems that ceph is completely stuck at 2/3 full, while we 
anticipated being able to fill up the cluster to at least 85-90% of the 
raw capacity. Or at least so that we would keep a functioning cluster 
when we have a single osd node fail.

Cheers

/Simon
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com