Re: How to properly deal with NEAR FULL OSD

"Stillwell, Bryan" <bryan.stillwell@xxxxxxxxxxx> · Thu, 18 Feb 2016 00:31:35 +0000

Vlad,

First off your cluster is rather full (80.31%).  Hopefully you have
hardware ordered for an expansion in the near future.

Based on your 'ceph osd tree' output, it doesn't look like the
reweight-by-utilization did anything for you.  That last number for each
OSD is set to 1, which means it didn't reweight any of the OSDs.  This is
a different weight than the CRUSH weight, and something you can manually
modify as well.

For example you could manually tweak the weights of the fullest OSDs with:

ceph osd reweight osd.23 0.95
ceph osd reweight osd.7 0.95
ceph osd reweight osd.8 0.95

Then just keep tweaking those numbers until the cluster gets an even
distribution of PGs across the OSDs.  The reweight-by-utilization option
can help make this quicker.

Your volumes pool also doesn't have a power of two for pg_num, so your PGs
will have uneven sizes.  Since you can't go back down to 256 PGs, you
should look at gradually increasing them up to 512 PGs.

There are also inconsistent PGs that you should look at repairing.  It
won't help you with the data distribution, but it's good cluster
maintenance.

Bryan

From:  ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Vlad
Blando <vblando@xxxxxxxxxxxxx>
Date:  Wednesday, February 17, 2016 at 5:11 PM
To:  ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject:   How to properly deal with NEAR FULL OSD

>Hi This been bugging me for some time now, the distribution of data on
>the OSD is not balanced so some OSD are near full, i did ceph
> osd reweight-by-utilization but it not helping much.
>
>
>[root@controller-node ~]# ceph osd tree
># id    weight  type name       up/down reweight
>-1      98.28   root default
>-2      32.76           host ceph-node-1
>0       3.64                    osd.0   up      1
>1       3.64                    osd.1   up      1
>2       3.64                    osd.2   up      1
>3       3.64                    osd.3   up      1
>4       3.64                    osd.4   up      1
>5       3.64                    osd.5   up      1
>6       3.64                    osd.6   up      1
>7       3.64                    osd.7   up      1
>8       3.64                    osd.8   up      1
>-3      32.76           host ceph-node-2
>9       3.64                    osd.9   up      1
>10      3.64                    osd.10  up      1
>11      3.64                    osd.11  up      1
>12      3.64                    osd.12  up      1
>13      3.64                    osd.13  up      1
>14      3.64                    osd.14  up      1
>15      3.64                    osd.15  up      1
>16      3.64                    osd.16  up      1
>17      3.64                    osd.17  up      1
>-4      32.76           host ceph-node-3
>18      3.64                    osd.18  up      1
>19      3.64                    osd.19  up      1
>20      3.64                    osd.20  up      1
>21      3.64                    osd.21  up      1
>22      3.64                    osd.22  up      1
>23      3.64                    osd.23  up      1
>24      3.64                    osd.24  up      1
>25      3.64                    osd.25  up      1
>26      3.64                    osd.26  up      1
>[root@controller-node ~]#
>
>
>[root@controller-node ~]# /opt/df-osd.sh
>ceph-node-1
>=======================================================================
>/dev/sdb1              3.7T  2.0T  1.7T  54% /var/lib/ceph/osd/ceph-0
>/dev/sdc1              3.7T  2.7T  1.1T  72% /var/lib/ceph/osd/ceph-1
>/dev/sdd1              3.7T  3.3T  431G  89% /var/lib/ceph/osd/ceph-2
>/dev/sde1              3.7T  2.8T  879G  77% /var/lib/ceph/osd/ceph-3
>/dev/sdf1              3.7T  3.3T  379G  90% /var/lib/ceph/osd/ceph-4
>/dev/sdg1              3.7T  2.9T  762G  80% /var/lib/ceph/osd/ceph-5
>/dev/sdh1              3.7T  3.0T  733G  81% /var/lib/ceph/osd/ceph-6
>/dev/sdi1              3.7T  3.4T  284G  93% /var/lib/ceph/osd/ceph-7
>/dev/sdj1              3.7T  3.4T  342G  91% /var/lib/ceph/osd/ceph-8
>=======================================================================
>ceph-node-2
>=======================================================================
>/dev/sdb1              3.7T  3.1T  622G  84% /var/lib/ceph/osd/ceph-9
>/dev/sdc1              3.7T  2.7T  1.1T  72% /var/lib/ceph/osd/ceph-10
>/dev/sdd1              3.7T  3.1T  557G  86% /var/lib/ceph/osd/ceph-11
>/dev/sde1              3.7T  3.3T  392G  90% /var/lib/ceph/osd/ceph-12
>/dev/sdf1              3.7T  2.6T  1.1T  72% /var/lib/ceph/osd/ceph-13
>/dev/sdg1              3.7T  2.8T  879G  77% /var/lib/ceph/osd/ceph-14
>/dev/sdh1              3.7T  2.7T  984G  74% /var/lib/ceph/osd/ceph-15
>/dev/sdi1              3.7T  3.2T  463G  88% /var/lib/ceph/osd/ceph-16
>/dev/sdj1              3.7T  3.1T  594G  85% /var/lib/ceph/osd/ceph-17
>=======================================================================
>ceph-node-3
>=======================================================================
>/dev/sdb1              3.7T  2.8T  910G  76% /var/lib/ceph/osd/ceph-18
>/dev/sdc1              3.7T  2.7T 1012G  73% /var/lib/ceph/osd/ceph-19
>/dev/sdd1              3.7T  3.2T  537G  86% /var/lib/ceph/osd/ceph-20
>/dev/sde1              3.7T  3.2T  465G  88% /var/lib/ceph/osd/ceph-21
>/dev/sdf1              3.7T  3.0T  663G  83% /var/lib/ceph/osd/ceph-22
>/dev/sdg1              3.7T  3.4T  248G  94% /var/lib/ceph/osd/ceph-23
>/dev/sdh1              3.7T  2.8T  928G  76% /var/lib/ceph/osd/ceph-24
>/dev/sdi1              3.7T  2.9T  802G  79% /var/lib/ceph/osd/ceph-25
>/dev/sdj1              3.7T  2.7T  1.1T  73% /var/lib/ceph/osd/ceph-26
>=======================================================================
>[root@controller-node ~]#
>
>
>
>[root@controller-node ~]# ceph health detail
>HEALTH_ERR 2 pgs inconsistent; 10 near full osd(s); 2 scrub errors
>pg 5.7f is active+clean+inconsistent, acting [2,12,18]
>pg 5.38 is active+clean+inconsistent, acting [7,9,24]
>osd.2 is near full at 88%
>osd.4 is near full at 89%
>osd.7 is near full at 92%
>osd.8 is near full at 90%
>osd.11 is near full at 85%
>osd.12 is near full at 89%
>osd.16 is near full at 87%
>osd.20 is near full at 85%
>osd.21 is near full at 87%
>osd.23 is near full at 93%
>2 scrub errors
>[root@controller-node ~]#
>
>[root@controller-node ~]# ceph df
>GLOBAL:
>    SIZE        AVAIL      RAW USED     %RAW USED
>    100553G     19796G     80757G       80.31
>POOLS:
>    NAME        ID     USED       %USED     OBJECTS
>    images      4      8680G      8.63      1111395
>    volumes     5      18192G     18.09     4675359
>[root@controller-node ~]#
>
>
>[root@controller-node ~]# ceph osd pool get images pg_num
>pg_num: 1024
>[root@controller-node ~]#
>[root@controller-node ~]# ceph osd pool get volumes pg_num
>pg_num: 300
>[root@controller-node ~]#
>
>
>
>Thanks.
>
>
>/Vlad

________________________________

This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com