Re: Need help! Ceph backfill_toofull and recovery_wait+degraded

David Turner <david.turner@xxxxxxxxxxxxxxxx> · Tue, 1 Nov 2016 19:47:47 +0000

Your weights are very poorly managed.  if you have a 1TB drive, it's weight should be about 1, if you have an 8TB drive, it's weight should be about 8.  You have 4TB drives with
 a weight of 3.64 (which is good), but the new node you added with 4x 8TB drives have weights ranging from 3.3-5.  The weight on the 8TB drives are telling the cluster they don't want data and the 4TB drives are the recipients of that by being way too full.

Like Ronny said, you also have your nodes unbalanced.  You have 32TB in ceph4 and 12TB between the other 3 nodes.  The best case for your data to settle right now (assuming the default settings of 3 replica size and HOST failure domain) is to have 1/3 of your
 data on ceph4 with 32TB of disks and 2/3 of your data split between ceph1, ceph2, & ceph3 with 12TB of disks.  Your cluster would have disks too full at about 5-6TB of actual data taking 16TB of raw space.

The easiest way to resolve this would probably be to move 2 osds from ceph4 into 2 of the other hosts and to set the weight on all of the 8TB drives to 7.45.  You can migrate osds between hosts without removing and adding them back in.

Can you please confirm what your replication size is and what your failure domain is for the cluster?

David Turner |
Cloud Operations Engineer |
StorageCraft
 Technology Corporation

380 Data Drive Suite 300 |
Draper |
Utah |
84020

Office:
801.871.2760 |
Mobile:
385.224.2943

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this
 message is prohibited.

From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Marcus Müller [mueller.marcus@xxxxxxxxx]

Sent: Tuesday, November 01, 2016 1:14 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject: [ceph-users] Need help! Ceph backfill_toofull and recovery_wait+degraded

Hi all,

i have a big problem and i really hope someone can help me!

We are running a ceph cluster since a year now. Version is: 0.94.7 (Hammer)
Here is some info:

Our osd map is:

ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY 

-1 26.67998 root default                                     

-2  3.64000     host ceph1                                   

 0  3.64000         osd.0       up  1.00000          1.00000 

-3  3.50000     host ceph2                                   

 1  3.50000         osd.1       up  1.00000          1.00000 

-4  3.64000     host ceph3                                   

 2  3.64000         osd.2       up  1.00000          1.00000 

-5 15.89998     host ceph4                                   

 3  4.00000         osd.3       up  1.00000          1.00000 

 4  3.59999         osd.4       up  1.00000          1.00000 

 5  3.29999         osd.5       up  1.00000          1.00000 

 6  5.00000         osd.6       up  1.00000          1.00000 

ceph df:

GLOBAL:

    SIZE       AVAIL      RAW USED     %RAW USED 

    40972G     26821G       14151G         34.54 

POOLS:

    NAME                ID     USED      %USED     MAX AVAIL     OBJECTS 

    blocks              7      4490G     10.96         1237G     7037004 

    commits             8       473M         0         1237G      802353 

    fs                  9      9666M      0.02         1237G     7863422 

ceph osd df:

ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR  

 0 3.64000  1.00000  3724G  3128G   595G 84.01 2.43 

 1 3.50000  1.00000  3724G  3237G   487G 86.92 2.52 

 2 3.64000  1.00000  3724G  3180G   543G 85.41 2.47 

 3 4.00000  1.00000  7450G  1616G  5833G 21.70 0.63 

 4 3.59999  1.00000  7450G  1246G  6203G 16.74 0.48 

 5 3.29999  1.00000  7450G  1181G  6268G 15.86 0.46 

 6 5.00000  1.00000  7450G   560G  6889G  7.52 0.22 

              TOTAL 40972G 14151G 26820G 34.54      

MIN/MAX VAR: 0.22/2.52  STDDEV: 36.53

Our current cluster state is: 

     health HEALTH_WARN

            63 pgs backfill

            8 pgs backfill_toofull

            9 pgs backfilling

            11 pgs degraded

            1 pgs recovering

            10 pgs recovery_wait

            11 pgs stuck degraded

            89 pgs stuck unclean

            recovery 8237/52179437 objects degraded (0.016%)

            recovery 9620295/52179437 objects misplaced (18.437%)

            2 near full osd(s)

            noout,noscrub,nodeep-scrub flag(s) set

     monmap e8: 4 mons at {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0}

            election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4

     osdmap e1774: 7 osds: 7 up, 7 in; 84 remapped pgs

            flags noout,noscrub,nodeep-scrub

      pgmap v7316159: 320 pgs, 3 pools, 4501 GB data, 15336 kobjects

            14152 GB used, 26820 GB / 40972 GB avail

            8237/52179437 objects degraded (0.016%)

            9620295/52179437 objects misplaced (18.437%)

                 231 active+clean

                  61 active+remapped+wait_backfill

                   9 active+remapped+backfilling

                   6 active+recovery_wait+degraded+remapped

                   6 active+remapped+backfill_toofull

                   4 active+recovery_wait+degraded

                   2 active+remapped+wait_backfill+backfill_toofull

                   1 active+recovering+degraded

recovery io 11754 kB/s, 35 objects/s

  client io 1748 kB/s rd, 249 kB/s wr, 44 op/s

My main problems are: 

- As you can see from the osd tree, we have three separate hosts with only one osd each. Another one has four osds. Ceph allows me not to get data back from these three nodes with only one HDD, which are all near full. I tried to set
 the weight of the osds in the bigger node higher but this just does not work. So i added a new osd yesterday which made things not better, as you can see now. What do i have to do to just become these three nodes empty again and put more data on the other
 node with the four HDDs.

- I added the „ceph4“ node later, this resulted in a strange ip change as you can see in the mon list. The public
 network and the cluster network were swapped or not assigned right.
 See ceph.conf

[global]

fsid = xxx

mon_initial_members = ceph1

mon_host = 192.168.10.3, 192.168.10.4, 192.168.10.5, 192.168.10.11

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

public_network = 192.168.60.0/24

cluster_network = 192.168.10.0/24

osd pool default size = 3

osd pool default min size = 1

osd pool default pg num = 128

osd pool default pgp num = 128

osd recovery max active = 50

osd recovery threads = 3

mon_pg_warn_max_per_osd = 0

  What can i do in this case (it’s no big problem since the network is 2x 10 GBE and everything works)?

- One other thing. Even if i just prepare the osd, it’s automatically added to the cluster. I can not activate it. Has had someone other already such behavior?

I’m now trying to delete something in the cluster, which already helped a bit:

     health HEALTH_WARN
            63 pgs backfill
            8 pgs backfill_toofull
            10 pgs backfilling
            7 pgs degraded
            3 pgs recovery_wait
            7 pgs stuck degraded
            82 pgs stuck unclean
            recovery 6498/52085528 objects degraded (0.012%)
            recovery 9507140/52085528 objects misplaced (18.253%)
            2 near full osd(s)
            noout,noscrub,nodeep-scrub flag(s) set
     monmap e8: 4 mons at {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0}
            election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4
     osdmap e1780: 7 osds: 7 up, 7 in; 81 remapped pgs
            flags noout,noscrub,nodeep-scrub
      pgmap v7317114: 320 pgs, 3 pools, 4499 GB data, 15333 kobjects
            14100 GB used, 26872 GB / 40972 GB avail
            6498/52085528 objects degraded (0.012%)
            9507140/52085528 objects misplaced (18.253%)
                 238 active+clean
                  60 active+remapped+wait_backfill
                   7 active+remapped+backfilling
                   6 active+remapped+backfill_toofull
                   3 active+degraded+remapped+backfilling
                   2 active+remapped+wait_backfill+backfill_toofull
                   2 active+recovery_wait+degraded+remapped
                   1 active+degraded+remapped+wait_backfill
                   1 active+recovery_wait+degraded
recovery io 7844 kB/s, 27 objects/s
  client io 343 kB/s rd, 1 op/s

If you need more information, just say it. I need really help!

Thank you so far for reading!

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com