On 26-08-19 12:00, EDH - Manuel Rios Fernandez wrote:
Balancer just balance in Healthy mode.
The problem is that data is distributed without be balanced in their first
write, that cause unproperly data balanced across osd.
I suppose the crush algorithm doesn't take the fullness of the osds into
account when placing objects...
This problem only happens in CEPH, we are the same with 14.2.2, having to
change the weight manually.Because the balancer is a passive element of the
cluster.
I hope in next version we get a more aggressive balancer, like enterprises
storages that allow to fill up 95% storage (raw).
I'm thinking a cronjob with a script to parse the output of `ceph osd df
tree` and reweight according to the percentage used would be relatively
easy to write. But I'll concentrate on monitoring before I start
tweaking there ;-)
Cheers
/Simon
Regards
-----Mensaje original-----
De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Simon
Oosthoek
Enviado el: lunes, 26 de agosto de 2019 11:52
Para: Dan van der Ster <dan@xxxxxxxxxxxxxx>
CC: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Asunto: Re: cephfs full, 2/3 Raw capacity used
On 26-08-19 11:37, Dan van der Ster wrote:
Thanks. The version and balancer config look good.
So you can try `ceph osd reweight osd.10 0.8` to see if it helps to
get you out of this.
I've done this and the next fullest 3 osds. This will take some time to
recover, I'll let you know when it's done.
Thanks,
/simon
-- dan
On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek
<s.oosthoek@xxxxxxxxxxxxx> wrote:
On 26-08-19 11:16, Dan van der Ster wrote:
Hi,
Which version of ceph are you using? Which balancer mode?
Nautilus (14.2.2), balancer is in upmap mode.
The balancer score isn't a percent-error or anything humanly usable.
`ceph osd df tree` can better show you exactly which osds are
over/under utilized and by how much.
Aha, I ran this and sorted on the %full column:
81 hdd 10.81149 1.00000 11 TiB 5.2 TiB 5.1 TiB 4 KiB 14 GiB
5.6 TiB 48.40 0.73 96 up osd.81
48 hdd 10.81149 1.00000 11 TiB 5.3 TiB 5.2 TiB 15 KiB 14 GiB
5.5 TiB 49.08 0.74 95 up osd.48
154 hdd 10.81149 1.00000 11 TiB 5.5 TiB 5.4 TiB 2.6 GiB 15 GiB
5.3 TiB 50.95 0.76 96 up osd.154
129 hdd 10.81149 1.00000 11 TiB 5.5 TiB 5.4 TiB 5.1 GiB 16 GiB
5.3 TiB 51.33 0.77 96 up osd.129
42 hdd 10.81149 1.00000 11 TiB 5.6 TiB 5.5 TiB 2.6 GiB 14 GiB
5.2 TiB 51.81 0.78 96 up osd.42
122 hdd 10.81149 1.00000 11 TiB 5.7 TiB 5.6 TiB 16 KiB 14 GiB
5.1 TiB 52.47 0.79 96 up osd.122
120 hdd 10.81149 1.00000 11 TiB 5.7 TiB 5.6 TiB 2.6 GiB 15 GiB
5.1 TiB 52.92 0.79 95 up osd.120
96 hdd 10.81149 1.00000 11 TiB 5.8 TiB 5.7 TiB 2.6 GiB 15 GiB
5.0 TiB 53.58 0.80 96 up osd.96
26 hdd 10.81149 1.00000 11 TiB 5.8 TiB 5.7 TiB 20 KiB 15 GiB
5.0 TiB 53.68 0.80 97 up osd.26
...
6 hdd 10.81149 1.00000 11 TiB 8.3 TiB 8.2 TiB 88 KiB 18 GiB
2.5 TiB 77.14 1.16 96 up osd.6
16 hdd 10.81149 1.00000 11 TiB 8.4 TiB 8.3 TiB 28 KiB 18 GiB
2.4 TiB 77.56 1.16 95 up osd.16
0 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.4 TiB 48 KiB 17 GiB
2.2 TiB 79.24 1.19 96 up osd.0
144 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 2.6 GiB 18 GiB
2.2 TiB 79.57 1.19 95 up osd.144
136 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 48 KiB 17 GiB
2.2 TiB 79.60 1.19 95 up osd.136
63 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 2.6 GiB 17 GiB
2.2 TiB 79.60 1.19 95 up osd.63
155 hdd 10.81149 1.00000 11 TiB 8.6 TiB 8.5 TiB 8 KiB 19 GiB
2.2 TiB 79.85 1.20 95 up osd.155
89 hdd 10.81149 1.00000 11 TiB 8.7 TiB 8.5 TiB 12 KiB 20 GiB
2.2 TiB 80.04 1.20 96 up osd.89
106 hdd 10.81149 1.00000 11 TiB 8.8 TiB 8.7 TiB 64 KiB 19 GiB
2.0 TiB 81.38 1.22 96 up osd.106
94 hdd 10.81149 1.00000 11 TiB 9.0 TiB 8.9 TiB 0 B 19 GiB
1.8 TiB 83.53 1.25 96 up osd.94
33 hdd 10.81149 1.00000 11 TiB 9.1 TiB 9.0 TiB 44 KiB 19 GiB
1.7 TiB 84.40 1.27 96 up osd.33
15 hdd 10.81149 1.00000 11 TiB 10 TiB 9.8 TiB 16 KiB 20 GiB
877 GiB 92.08 1.38 96 up osd.15
53 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 2.6 GiB 20 GiB
676 GiB 93.90 1.41 96 up osd.53
51 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 2.6 GiB 20 GiB
666 GiB 93.98 1.41 96 up osd.51
10 hdd 10.81149 1.00000 11 TiB 10 TiB 10 TiB 40 KiB 22 GiB
552 GiB 95.01 1.42 97 up osd.10
So the fullest one is at 95.01%, the emptiest one at 48.4%, so
there's some balancing to be done.
You might be able to manually fix things by using `ceph osd reweight
...` on the most full osds to move data elsewhere.
I'll look into this, but I was hoping that the balancer module would
take care of this...
Otherwise, in general, its good to setup monitoring so you notice
and take action well before the osds fill up.
Yes, I'm still working on this, I want to add some checks to our
check_mk+icinga setup using native plugins, but my python skills are
not quite up to the task, at least, not yet ;-)
Cheers
/Simon
Cheers, Dan
On Mon, Aug 26, 2019 at 11:09 AM Simon Oosthoek
<s.oosthoek@xxxxxxxxxxxxx> wrote:
Hi all,
we're building up our experience with our ceph cluster before we
take it into production. I've now tried to fill up the cluster with
cephfs, which we plan to use for about 95% of all data on the cluster.
The cephfs pools are full when the cluster reports 67% raw capacity
used. There are 4 pools we use for cephfs data, 3-copy, 4-copy, EC
8+3 and EC 5+7. The balancer module is turned on and `ceph balancer
eval` gives `current cluster score 0.013255 (lower is better)`, so
well within the default 5% margin. Is there a setting we can tweak
to increase the usable RAW capacity to say 85% or 90%, or is this
the most we can expect to store on the cluster?
[root@cephmon1 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW
USED
hdd 1.8 PiB 605 TiB 1.2 PiB 1.2 PiB
66.71
TOTAL 1.8 PiB 605 TiB 1.2 PiB 1.2 PiB
66.71
POOLS:
POOL ID STORED OBJECTS USED
%USED MAX AVAIL
cephfs_data 1 111 MiB 79.26M 1.2 GiB
100.00 0 B
cephfs_metadata 2 52 GiB 4.91M 52 GiB
100.00 0 B
cephfs_data_4copy 3 106 TiB 46.36M 428 TiB
100.00 0 B
cephfs_data_3copy 8 93 TiB 42.08M 282 TiB
100.00 0 B
cephfs_data_ec83 13 106 TiB 50.11M 161 TiB
100.00 0 B
rbd 14 21 GiB 5.62k 63 GiB
100.00 0 B
.rgw.root 15 1.2 KiB 4 1 MiB
100.00 0 B
default.rgw.control 16 0 B 8 0 B
0 0 B
default.rgw.meta 17 765 B 4 1 MiB
100.00 0 B
default.rgw.log 18 0 B 207 0 B
0 0 B
scbench 19 133 GiB 34.14k 400 GiB
100.00 0 B
cephfs_data_ec57 20 126 TiB 51.84M 320 TiB
100.00 0 B
[root@cephmon1 ~]# ceph balancer eval current cluster score
0.013255 (lower is better)
Being full at 2/3 Raw used is a bit too "pretty" to be accidental,
it seems like this could be a parameter for cephfs, however, I
couldn't find anything like this in the documentation for Nautilus.
The logs in the dashboard show this:
2019-08-26 11:00:00.000630
[ERR]
overall HEALTH_ERR 3 backfillfull osd(s); 1 full osd(s); 12 pool(s)
full
2019-08-26 10:57:44.539964
[INF]
Health check cleared: POOL_BACKFILLFULL (was: 12 pool(s)
backfillfull)
2019-08-26 10:57:44.539944
[WRN]
Health check failed: 12 pool(s) full (POOL_FULL)
2019-08-26 10:57:44.539926
[ERR]
Health check failed: 1 full osd(s) (OSD_FULL)
2019-08-26 10:57:44.539899
[WRN]
Health check update: 3 backfillfull osd(s) (OSD_BACKFILLFULL)
2019-08-26 10:00:00.000088
[WRN]
overall HEALTH_WARN 4 backfillfull osd(s); 12 pool(s) backfillfull
So it seems that ceph is completely stuck at 2/3 full, while we
anticipated being able to fill up the cluster to at least 85-90% of
the raw capacity. Or at least so that we would keep a functioning
cluster when we have a single osd node fail.
Cheers
/Simon
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com