Re: How to fix an incomplete PG on an 2 copy ceph-cluster?

Gregory Farnum <greg@xxxxxxxxxxx> · Sun, 16 Feb 2014 09:48:00 -0800



Check out http://ceph.com/docs/master/rados/operations/placement-groups/#get-statistics-for-stuck-pgs
and http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/.
What does the dump of the PG say is going on?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Feb 16, 2014 at 12:32 AM, Udo Lembke <ulembke@xxxxxxxxxxxx> wrote:
> Hi,
> I switch some disks from manual format to ceph-deploy (because slightly
> different xfs-parameters) - all disks are on a single node of an 4-node
> cluster.
> After rebuilding the osd-disk one PG are incomplete:
> ceph -s
>     cluster 591db070-15c1-4c7a-b107-67717bdb87d9
>      health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs
> stuck unclean
>      monmap e7: 3 mons at
> {a=172.20.2.11:6789/0,b=172.20.2.64:6789/0,c=172.20.2.65:6789/0},
> election epoch 1178, quorum 0,1,2 a,b,c
>      mdsmap e409: 1/1/1 up {0=b=up:active}, 2 up:standby
>      osdmap e22002: 52 osds: 52 up, 52 in
>       pgmap v10177038: 7408 pgs, 5 pools, 58618 GB data, 14662 kobjects
>             114 TB used, 76319 GB / 189 TB avail
>                 7405 active+clean
>                    1 incomplete
>                    2 active+clean+scrubbing+deep
>
> The pg are on one of the "rebuilded" disk (osd.42):
> ceph pg map 6.289
> osdmap e22002 pg 6.289 (6.289) -> up [42,31] acting [42,31]
>
> ls -lsa /var/lib/ceph/osd/ceph-42/current/6.289_head/
> insgesamt 16
>  0 drwxr-xr-x   2 root root     6 Feb 15 20:11 .
> 16 drwxr-xr-x 411 root root 12288 Feb 16 03:09 ..
>
> ls -lsa
> /var/lib/ceph/osd/ceph-31/current/6.289*/
>
> /var/lib/ceph/osd/ceph-31/current/6.289_head/:
>
> insgesamt
> 20520
>
>    8 drwxr-xr-x   2 root root    4096 Feb 15 10:24
> .
>
>   12 drwxr-xr-x 320 root root    8192 Feb 15 21:11
> ..
>
> 4100 -rw-r--r--   1 root root 4194304 Feb 15 10:24
> benchmark\udata\uproxmox4\u638085\uobject2844__head_4F14E289__6
> 4100 -rw-r--r--   1 root root 4194304 Feb 15 10:24
> benchmark\udata\uproxmox4\u638085\uobject3975__head_A7EBCA89__6
> 4100 -rw-r--r--   1 root root 4194304 Feb 15 10:24
> benchmark\udata\uproxmox4\u638085\uobject4003__head_537FE289__6
> 4100 -rw-r--r--   1 root root 4194304 Feb 15 10:24
> benchmark\udata\uproxmox4\u673679\uobject344__head_FF4A1289__6
> 4100 -rw-r--r--   1 root root 4194304 Feb 15 10:24
> benchmark\udata\uproxmox4\u673679\uobject474__head_5FC3EA89__6
>
> /var/lib/ceph/osd/ceph-31/current/6.289_TEMP/:
> insgesamt 16
>  4 drwxr-xr-x   2 root root    6 Feb 15 10:24 .
> 12 drwxr-xr-x 320 root root 8192 Feb 15 21:11 ..
>
> How to say ceph, that the content on osd.31 is the right one?
> I have tried an "ceph osd repair osd.42" without luck.
>
> In the manual I saw only "ceph osd lost NN" but then all other data will
> also rebuild to other disks I guess.
> If "osd lost" the only option, how reuse osd-42? Waiting for an healthy
> cluster and then recreate the disk?
>
> Hope for an hint.
>
>
> Best regards
>
> Udo
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com