Re: cephfs full, 2/3 Raw capacity used

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 26 Aug 2019 11:37:32 +0200

Thanks. The version and balancer config look good.

So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.

-- dan

On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
<s.oosthoek@xxxxxxxxxxxxx> wrote:
>
> On 26-08-19 11:16, Dan van der Ster wrote:
> > Hi,
> >
> > Which version of ceph are you using? Which balancer mode?
>
> Nautilus (14.2.2), balancer is in upmap mode.
>
> > The balancer score isn't a percent-error or anything humanly usable.
> > `ceph osd df tree` can better show you exactly which osds are
> > over/under utilized and by how much.
> >
>
> Aha, I ran this and sorted on the %full column:
>
>   81   hdd   10.81149  1.00000  11 TiB 5.2 TiB 5.1 TiB   4 KiB  14 GiB
> 5.6 TiB 48.40 0.73  96     up                 osd.81
>   48   hdd   10.81149  1.00000  11 TiB 5.3 TiB 5.2 TiB  15 KiB  14 GiB
> 5.5 TiB 49.08 0.74  95     up                 osd.48
> 154   hdd   10.81149  1.00000  11 TiB 5.5 TiB 5.4 TiB 2.6 GiB  15 GiB
> 5.3 TiB 50.95 0.76  96     up                 osd.154
> 129   hdd   10.81149  1.00000  11 TiB 5.5 TiB 5.4 TiB 5.1 GiB  16 GiB
> 5.3 TiB 51.33 0.77  96     up                 osd.129
>   42   hdd   10.81149  1.00000  11 TiB 5.6 TiB 5.5 TiB 2.6 GiB  14 GiB
> 5.2 TiB 51.81 0.78  96     up                 osd.42
> 122   hdd   10.81149  1.00000  11 TiB 5.7 TiB 5.6 TiB  16 KiB  14 GiB
> 5.1 TiB 52.47 0.79  96     up                 osd.122
> 120   hdd   10.81149  1.00000  11 TiB 5.7 TiB 5.6 TiB 2.6 GiB  15 GiB
> 5.1 TiB 52.92 0.79  95     up                 osd.120
>   96   hdd   10.81149  1.00000  11 TiB 5.8 TiB 5.7 TiB 2.6 GiB  15 GiB
> 5.0 TiB 53.58 0.80  96     up                 osd.96
>   26   hdd   10.81149  1.00000  11 TiB 5.8 TiB 5.7 TiB  20 KiB  15 GiB
> 5.0 TiB 53.68 0.80  97     up                 osd.26
> ...
>    6   hdd   10.81149  1.00000  11 TiB 8.3 TiB 8.2 TiB  88 KiB  18 GiB
> 2.5 TiB 77.14 1.16  96     up                 osd.6
>   16   hdd   10.81149  1.00000  11 TiB 8.4 TiB 8.3 TiB  28 KiB  18 GiB
> 2.4 TiB 77.56 1.16  95     up                 osd.16
>    0   hdd   10.81149  1.00000  11 TiB 8.6 TiB 8.4 TiB  48 KiB  17 GiB
> 2.2 TiB 79.24 1.19  96     up                 osd.0
> 144   hdd   10.81149  1.00000  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  18 GiB
> 2.2 TiB 79.57 1.19  95     up                 osd.144
> 136   hdd   10.81149  1.00000  11 TiB 8.6 TiB 8.5 TiB  48 KiB  17 GiB
> 2.2 TiB 79.60 1.19  95     up                 osd.136
>   63   hdd   10.81149  1.00000  11 TiB 8.6 TiB 8.5 TiB 2.6 GiB  17 GiB
> 2.2 TiB 79.60 1.19  95     up                 osd.63
> 155   hdd   10.81149  1.00000  11 TiB 8.6 TiB 8.5 TiB   8 KiB  19 GiB
> 2.2 TiB 79.85 1.20  95     up                 osd.155
>   89   hdd   10.81149  1.00000  11 TiB 8.7 TiB 8.5 TiB  12 KiB  20 GiB
> 2.2 TiB 80.04 1.20  96     up                 osd.89
> 106   hdd   10.81149  1.00000  11 TiB 8.8 TiB 8.7 TiB  64 KiB  19 GiB
> 2.0 TiB 81.38 1.22  96     up                 osd.106
>   94   hdd   10.81149  1.00000  11 TiB 9.0 TiB 8.9 TiB     0 B  19 GiB
> 1.8 TiB 83.53 1.25  96     up                 osd.94
>   33   hdd   10.81149  1.00000  11 TiB 9.1 TiB 9.0 TiB  44 KiB  19 GiB
> 1.7 TiB 84.40 1.27  96     up                 osd.33
>   15   hdd   10.81149  1.00000  11 TiB  10 TiB 9.8 TiB  16 KiB  20 GiB
> 877 GiB 92.08 1.38  96     up                 osd.15
>   53   hdd   10.81149  1.00000  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
> 676 GiB 93.90 1.41  96     up                 osd.53
>   51   hdd   10.81149  1.00000  11 TiB  10 TiB  10 TiB 2.6 GiB  20 GiB
> 666 GiB 93.98 1.41  96     up                 osd.51
>   10   hdd   10.81149  1.00000  11 TiB  10 TiB  10 TiB  40 KiB  22 GiB
> 552 GiB 95.01 1.42  97     up                 osd.10
>
> So the fullest one is at 95.01%, the emptiest one at 48.4%, so there's
> some balancing to be done.
>
> > You might be able to manually fix things by using `ceph osd reweight
> > ...` on the most full osds to move data elsewhere.
>
> I'll look into this, but I was hoping that the balancer module would
> take care of this...
>
> >
> > Otherwise, in general, its good to setup monitoring so you notice and
> > take action well before the osds fill up.
>
> Yes, I'm still working on this, I want to add some checks to our
> check_mk+icinga setup using native plugins, but my python skills are not
> quite up to the task, at least, not yet ;-)
>
> Cheers
>
> /Simon
>
> >
> > Cheers, Dan
> >
> > On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
> > <s.oosthoek@xxxxxxxxxxxxx> wrote:
> >>
> >> Hi all,
> >>
> >> we're building up our experience with our ceph cluster before we take it
> >> into production. I've now tried to fill up the cluster with cephfs,
> >> which we plan to use for about 95% of all data on the cluster.
> >>
> >> The cephfs pools are full when the cluster reports 67% raw capacity
> >> used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3
> >> and EC 5+7. The balancer module is turned on and `ceph balancer eval`
> >> gives `current cluster score 0.013255 (lower is better)`, so well within
> >> the default 5% margin. Is there a setting we can tweak to increase the
> >> usable RAW capacity to say 85% or 90%, or is this the most we can expect
> >> to store on the cluster?
> >>
> >> [root@cephmon1 ~]# ceph df
> >> RAW STORAGE:
> >>       CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
> >>       hdd       1.8 PiB     605 TiB     1.2 PiB      1.2 PiB         66.71
> >>       TOTAL     1.8 PiB     605 TiB     1.2 PiB      1.2 PiB         66.71
> >>
> >> POOLS:
> >>       POOL                    ID     STORED      OBJECTS     USED
> >> %USED      MAX AVAIL
> >>       cephfs_data              1     111 MiB      79.26M     1.2 GiB
> >> 100.00           0 B
> >>       cephfs_metadata          2      52 GiB       4.91M      52 GiB
> >> 100.00           0 B
> >>       cephfs_data_4copy        3     106 TiB      46.36M     428 TiB
> >> 100.00           0 B
> >>       cephfs_data_3copy        8      93 TiB      42.08M     282 TiB
> >> 100.00           0 B
> >>       cephfs_data_ec83        13     106 TiB      50.11M     161 TiB
> >> 100.00           0 B
> >>       rbd                     14      21 GiB       5.62k      63 GiB
> >> 100.00           0 B
> >>       .rgw.root               15     1.2 KiB           4       1 MiB
> >> 100.00           0 B
> >>       default.rgw.control     16         0 B           8         0 B
> >>       0           0 B
> >>       default.rgw.meta        17       765 B           4       1 MiB
> >> 100.00           0 B
> >>       default.rgw.log         18         0 B         207         0 B
> >>       0           0 B
> >>       scbench                 19     133 GiB      34.14k     400 GiB
> >> 100.00           0 B
> >>       cephfs_data_ec57        20     126 TiB      51.84M     320 TiB
> >> 100.00           0 B
> >> [root@cephmon1 ~]# ceph balancer eval
> >> current cluster score 0.013255 (lower is better)
> >>
> >>
> >> Being full at 2/3 Raw used is a bit too "pretty" to be accidental, it
> >> seems like this could be a parameter for cephfs, however, I couldn't
> >> find anything like this in the documentation for Nautilus.
> >>
> >>
> >> The logs in the dashboard show this:
> >> 2019-08-26 11:00:00.000630
> >> [ERR]
> >> overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s) full
> >>
> >> 2019-08-26 10:57:44.539964
> >> [INF]
> >> Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s) backfillfull)
> >>
> >> 2019-08-26 10:57:44.539944
> >> [WRN]
> >> Health check failed: 12 pool(s) full (POOL_FULL)
> >>
> >> 2019-08-26 10:57:44.539926
> >> [ERR]
> >> Health check failed: 1 full osd(s) (OSD_FULL)
> >>
> >> 2019-08-26 10:57:44.539899
> >> [WRN]
> >> Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL)
> >>
> >> 2019-08-26 10:00:00.000088
> >> [WRN]
> >> overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull
> >>
> >> So it seems that ceph is completely stuck at 2/3 full, while we
> >> anticipated being able to fill up the cluster to at least 85-90% of the
> >> raw capacity. Or at least so that we would keep a functioning cluster
> >> when we have a single osd node fail.
> >>
> >> Cheers
> >>
> >> /Simon
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com