Re: norecover and nobackfill

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 13 Apr 2015 17:14:41 -0600

After doing some testing, I'm a bit confused even more.

What I'm trying to achieve is minimal data movement when I have to service a node to replace a failed drive. Since these nodes don't have hot-swap bays, I'll need to power down the box to replace the failed drive. I don't want Ceph to shuffle data until the new drive comes up and is ready.

My thought was to set norecover nobackfill, take down the host, replace the drive, start the host, remove the old OSD from the cluster, ceph-disk prepare the new disk then unset norecover nobackfill.

However in my testing with a 4 node cluster ( v.94.0 10 OSDs each, replication 3, min_size 2, chooselead_fristn host), if I take down a host I/O becomes blocked even though only one copy should be taken down and still satisfies min_size. When I unset norecover, then I/O proceeds and some backfill activity happens. At some point the backfill stops and everything seems to be "happy" in the degraded state.

I'm really interested to know what is going on with "norecover" as the cluster seems to break if it is set. Unsetting the "norecover" flag causes some degraded objects to recover, but not all. Writing to new blocks in an RBD causes the number of degraded objects to increase, but works just fine otherwise. Here is an example after taking down one host and removing the OSDs from the CRUSH map (I'm reformatting all the drives in the host currently).

# ceph status
    cluster 146c4fe8-7c85-46dc-b8b3-69072d658287
     health HEALTH_WARN
            1345 pgs backfill
            10 pgs backfilling
            2016 pgs degraded
            661 pgs recovery_wait
            2016 pgs stuck degraded
            2016 pgs stuck unclean
            1356 pgs stuck undersized
            1356 pgs undersized
            recovery 40642/167785 objects degraded (24.223%)
            recovery 31481/167785 objects misplaced (18.763%)
            too many PGs per OSD (665 > max 300)
            nobackfill flag(s) set
     monmap e5: 3 mons at {nodea=10.8.6.227:6789/0,nodeb=10.8.6.228:6789/0,nodec=10.8.6.229:6789/0}
            election epoch 2576, quorum 0,1,2 nodea,nodeb,nodec
     osdmap e59031: 30 osds: 30 up, 30 in; 1356 remapped pgs
            flags nobackfill
      pgmap v4723208: 6656 pgs, 4 pools, 330 GB data, 53235 objects
            863 GB used, 55000 GB / 55863 GB avail
            40642/167785 objects degraded (24.223%)
            31481/167785 objects misplaced (18.763%)
                4640 active+clean
                1345 active+undersized+degraded+remapped+wait_backfill
                 660 active+recovery_wait+degraded
                  10 active+undersized+degraded+remapped+backfilling
                   1 active+recovery_wait+undersized+degraded+remapped
  client io 1864 kB/s rd, 8853 kB/s wr, 65 op/s

Any help understanding these flags would be very helpful.

Thanks,
Robert

On Mon, Apr 13, 2015 at 1:40 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
I'm looking for documentation about what exactly each of these do and

I can't find it. Can someone point me in the right direction?

The names seem too ambiguous to come to any conclusion about what

exactly they do.

Thanks,

Robert

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com