Thanks. The version and balancer config look good. So you can try `ceph osd reweight osd.10 0.8` to see if it helps to get you out of this. -- dan On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> wrote: > > On 26-08-19 11:16, Dan van der Ster wrote: > > Hi, > > > > Which version of ceph are you using? Which balancer mode? > > Nautilus (14.2.2), balancer is in upmap mode. > > > The balancer score isn't a percent-error or anything humanly usable. > > `ceph osd df tree` can better show you exactly which osds are > > over/under utilized and by how much. > > > > Aha, I ran this and sorted on the %full column: > > 81 hdd 10.81149 1.00000 11 TiB 5.2 TiB 5.1 TiB 4 KiB 14 GiB > 5.6 TiB 48.40 0.73 96 up osd.81 > 48 hdd 10.81149 1.00000 11 TiB 5.3 TiB 5.2 TiB 15 KiB 14 GiB > 5.5 TiB 49.08 0.74 95 up osd.48 > 154 hdd 10.81149 1.00000 11 TiB 5.5 TiB 5.4 TiB 2.6 GiB 15 GiB > 5.3 TiB 50.95 0.76 96 up osd.154 > 129 hdd 10.81149 1.00000 11 TiB 5.5 TiB 5.4 TiB 5.1 GiB 16 GiB > 5.3 TiB 51.33 0.77 96 up osd.129 > 42 hdd 10.81149 1.00000 11 TiB 5.6 TiB 5.5 TiB 2.6 GiB 14 GiB > 5.2 TiB 51.81 0.78 96 up osd.42 > 122 hdd 10.81149 1.00000 11 TiB 5.7 TiB 5.6 TiB 16 KiB 14 GiB > 5.1 TiB 52.47 0.79 96 up osd.122 > 120 hdd 10.81149 1.00000 11 TiB 5.7 TiB 5.6 TiB 2.6 GiB 15 GiB > 5.1 TiB 52.92 0.79 95 up osd.120 > 96 hdd 10.81149 1.00000 11 TiB 5.8 TiB 5.7 TiB 2.6 GiB 15 GiB > 5.0 TiB 53.58 0.80 96 up osd.96 > 26 hdd 10.81149 1.00000 11 TiB 5.8 TiB 5.7 TiB 20 KiB 15 GiB > 5.0 TiB 53.68 0.80 97 up osd.26 > ... > 6 hdd 10.81149 1.00000 11 TiB 8.3 TiB 8.2 TiB 88 KiB 18 GiB > 2.5 TiB 77.14 1.16 96 up osd.6 > 16 hdd 10.81149 1.00000 11 TiB 8.4 TiB 8.3 TiB 28 KiB 18 GiB > 2.4 TiB 77.56 1.16 95 up osd.16 > 0 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.4 TiB 48 KiB 17 GiB > 2.2 TiB 79.24 1.19 96 up osd.0 > 144 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 2.6 GiB 18 GiB > 2.2 TiB 79.57 1.19 95 up osd.144 > 136 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 48 KiB 17 GiB > 2.2 TiB 79.60 1.19 95 up osd.136 > 63 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 2.6 GiB 17 GiB > 2.2 TiB 79.60 1.19 95 up osd.63 > 155 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 8 KiB 19 GiB > 2.2 TiB 79.85 1.20 95 up osd.155 > 89 hdd 10.81149 1.00000 11 TiB 8.7 TiB 8.5 TiB 12 KiB 20 GiB > 2.2 TiB 80.04 1.20 96 up osd.89 > 106 hdd 10.81149 1.00000 11 TiB 8.8 TiB 8.7 TiB 64 KiB 19 GiB > 2.0 TiB 81.38 1.22 96 up osd.106 > 94 hdd 10.81149 1.00000 11 TiB 9.0 TiB 8.9 TiB 0 B 19 GiB > 1.8 TiB 83.53 1.25 96 up osd.94 > 33 hdd 10.81149 1.00000 11 TiB 9.1 TiB 9.0 TiB 44 KiB 19 GiB > 1.7 TiB 84.40 1.27 96 up osd.33 > 15 hdd 10.81149 1.00000 11 TiB 10 TiB 9.8 TiB 16 KiB 20 GiB > 877 GiB 92.08 1.38 96 up osd.15 > 53 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 2.6 GiB 20 GiB > 676 GiB 93.90 1.41 96 up osd.53 > 51 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 2.6 GiB 20 GiB > 666 GiB 93.98 1.41 96 up osd.51 > 10 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 40 KiB 22 GiB > 552 GiB 95.01 1.42 97 up osd.10 > > So the fullest one is at 95.01%, the emptiest one at 48.4%, so there's > some balancing to be done. > > > You might be able to manually fix things by using `ceph osd reweight > > ...` on the most full osds to move data elsewhere. > > I'll look into this, but I was hoping that the balancer module would > take care of this... > > > > > Otherwise, in general, its good to setup monitoring so you notice and > > take action well before the osds fill up. > > Yes, I'm still working on this, I want to add some checks to our > check_mk+icinga setup using native plugins, but my python skills are not > quite up to the task, at least, not yet ;-) > > Cheers > > /Simon > > > > > Cheers, Dan > > > > On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek > > <s.oosthoek@xxxxxxxxxxxxx> wrote: > >> > >> Hi all, > >> > >> we're building up our experience with our ceph cluster before we take it > >> into production. I've now tried to fill up the cluster with cephfs, > >> which we plan to use for about 95% of all data on the cluster. > >> > >> The cephfs pools are full when the cluster reports 67% raw capacity > >> used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC 8+3 > >> and EC 5+7. The balancer module is turned on and `ceph balancer eval` > >> gives `current cluster score 0.013255 (lower is better)`, so well within > >> the default 5% margin. Is there a setting we can tweak to increase the > >> usable RAW capacity to say 85% or 90%, or is this the most we can expect > >> to store on the cluster? > >> > >> [root@cephmon1 ~]# ceph df > >> RAW STORAGE: > >> CLASS SIZE AVAIL USED RAW USED %RAW USED > >> hdd 1.8 PiB 605 TiB 1.2 PiB 1.2 PiB 66.71 > >> TOTAL 1.8 PiB 605 TiB 1.2 PiB 1.2 PiB 66.71 > >> > >> POOLS: > >> POOL ID STORED OBJECTS USED > >> %USED MAX AVAIL > >> cephfs_data 1 111 MiB 79.26M 1.2 GiB > >> 100.00 0 B > >> cephfs_metadata 2 52 GiB 4.91M 52 GiB > >> 100.00 0 B > >> cephfs_data_4copy 3 106 TiB 46.36M 428 TiB > >> 100.00 0 B > >> cephfs_data_3copy 8 93 TiB 42.08M 282 TiB > >> 100.00 0 B > >> cephfs_data_ec83 13 106 TiB 50.11M 161 TiB > >> 100.00 0 B > >> rbd 14 21 GiB 5.62k 63 GiB > >> 100.00 0 B > >> .rgw.root 15 1.2 KiB 4 1 MiB > >> 100.00 0 B > >> default.rgw.control 16 0 B 8 0 B > >> 0 0 B > >> default.rgw.meta 17 765 B 4 1 MiB > >> 100.00 0 B > >> default.rgw.log 18 0 B 207 0 B > >> 0 0 B > >> scbench 19 133 GiB 34.14k 400 GiB > >> 100.00 0 B > >> cephfs_data_ec57 20 126 TiB 51.84M 320 TiB > >> 100.00 0 B > >> [root@cephmon1 ~]# ceph balancer eval > >> current cluster score 0.013255 (lower is better) > >> > >> > >> Being full at 2/3 Raw used is a bit too "pretty" to be accidental, it > >> seems like this could be a parameter for cephfs, however, I couldn't > >> find anything like this in the documentation for Nautilus. > >> > >> > >> The logs in the dashboard show this: > >> 2019-08-26 11:00:00.000630 > >> [ERR] > >> overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s) full > >> > >> 2019-08-26 10:57:44.539964 > >> [INF] > >> Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s) backfillfull) > >> > >> 2019-08-26 10:57:44.539944 > >> [WRN] > >> Health check failed: 12 pool(s) full (POOL_FULL) > >> > >> 2019-08-26 10:57:44.539926 > >> [ERR] > >> Health check failed: 1 full osd(s) (OSD_FULL) > >> > >> 2019-08-26 10:57:44.539899 > >> [WRN] > >> Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL) > >> > >> 2019-08-26 10:00:00.000088 > >> [WRN] > >> overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull > >> > >> So it seems that ceph is completely stuck at 2/3 full, while we > >> anticipated being able to fill up the cluster to at least 85-90% of the > >> raw capacity. Or at least so that we would keep a functioning cluster > >> when we have a single osd node fail. > >> > >> Cheers > >> > >> /Simon > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com