Re: How to fix an incomplete PG on an 2 copy ceph-cluster?

Udo Lembke <ulembke@xxxxxxxxxxxx> · Tue, 18 Feb 2014 10:05:24 +0100

Hi Greg,
I have used the ultimative way with
ceph osd lost 42 --yes-i-really-mean-it

but the pg is further down:
ceph -s
    cluster 591db070-15c1-4c7a-b107-67717bdb87d9
     health HEALTH_WARN 206 pgs degraded; 1 pgs down; 57 pgs incomplete;
1 pgs peering; 31 pgs stuck inactive; 145 pgs stuck unclean; recovery
527486/30036784 objects degraded (1.756%); 1/52 in osds are down
     monmap e7: 3 mons at
{a=172.20.2.11:6789/0,b=172.20.2.64:6789/0,c=172.20.2.65:6789/0},
election epoch 1178, quorum 0,1,2 a,b,c
     mdsmap e409: 1/1/1 up {0=b=up:active}, 2 up:standby
     osdmap e22281: 52 osds: 51 up, 52 in
      pgmap v10321809: 7408 pgs, 5 pools, 58634 GB data, 14666 kobjects
            114 TB used, 76285 GB / 189 TB avail
            527486/30036784 objects degraded (1.756%)
                7144 active+clean
                   1 down+peering
                 206 active+degraded
                  57 incomplete
  client io 60506 B/s wr, 6 op/s

The pg-content are on osd-31:
ceph pg map 6.289
osdmap e22281 pg 6.289 (6.289) -> up [31] acting [31]

But an hour later the old state was rebuild:
ceph pg map 6.289
osdmap e22312 pg 6.289 (6.289) -> up [44,31] acting [44,31]
ls -lsa /var/lib/ceph/osd/ceph-44/current/6.289_head/
insgesamt 32
 0 drwxr-xr-x   2 root root     6 Feb 17 21:37 .
32 drwxr-xr-x 515 root root 16384 Feb 18 08:23 ..

How to remove/clean the PG?? The content (benchmark-files) is not
neccessary anymore.

But the ugly thing is, how this can happens - in the pg are no writes
during the first stop of the osd!?
I think the only way this conditions can appear is this scenario:

1. disk X on node 4 was recreated, so the cluster is in degraded state.
2. write happens to pg 6.289 on osd-42 and due the switch
"osd_pool_default_min_size = 1" the acknowledge to the client was sent
after a write on osd-42, before the write on node 2, osd-31 happens.
3. osd-42 was stopped and also reformatted and rebuildet (before the
write to pg 6.289 on osd-31 was done).

But there are two inconsenstensies:
First the acknowledge after only writen on one disk should only accour
if the second disk is down - in this case both disks are up.
Second: There are no writes during this time - I use the cluster only
for VMs (from proxmox-ve) and the disks are still created - so the
writes should only to existings pgs like:
2.551_head/DIR_1/DIR_5/DIR_5/DIR_0/rbd\udata.89cef2ae8944a.000000000016936a__head_60480551__2

Are there an wrong assumtion from me?

Any comments how to remove the incomplete pg?

Udo

Am 16.02.2014 18:48, schrieb Gregory Farnum:
> Check out http://ceph.com/docs/master/rados/operations/placement-groups/#get-statistics-for-stuck-pgs
> and http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/.
> What does the dump of the PG say is going on?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com