Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Guys

My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. 
Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering.


I have tried several things to recover them like , scrub , deep-scrub , pg repair , try changing primary affinity and then scrubbing , osd_pool_default_size etc. BUT NO LUCK

Could yo please advice , how to recover PG and achieve HEALTH_OK

# ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck inactive; 23 pgs stuck unclean; 2 requests are blocked > 32 sec; recovery 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
     monmap e3: 3 mons at {xxx=xxxx:6789/0,xxx=xxxx:6789:6789/0,xxx=xxxx:6789:6789/0}, election epoch 1474, quorum 0,1,2 xx,xx,xx
     osdmap e261536: 239 osds: 239 up, 238 in
      pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects
            20316 GB used, 844 TB / 864 TB avail
            531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
                   1 creating
               18409 active+clean
                   3 active+recovering
                  19 incomplete




# ceph pg dump_stuck unclean
ok
pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp
10.70 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.534911 0'0 261536:1015 [153,140,80] 153 [153,140,80] 153 0'0 2015-03-12 17:59:43.275049 0'0 2015-03-09 17:55:58.745662
3.dde 68 66 0 66 552861709 297 297 incomplete 2015-03-20 12:19:49.584839 33547'297 261536:228352 [174,5,179] 174 [174,5,179] 174 33547'297 2015-03-12 14:19:15.261595 28522'43 2015-03-11 14:19:13.894538
5.a2 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.560756 0'0 261536:897 [214,191,170] 214 [214,191,170] 214 0'0 2015-03-12 17:58:29.257085 0'0 2015-03-09 17:55:07.684377
13.1b6 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.846253 0'0 261536:1050 [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920 0'0 2015-03-09 17:56:18.715208
7.25b 16 0 0 0 67108864 16 16 incomplete 2015-03-20 12:19:49.639102 27666'16 261536:4777 [194,145,45] 194 [194,145,45] 194 27666'16 2015-03-12 17:59:06.357864 2330'3 2015-03-09 17:55:30.754522
5.19 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.742698 0'0 261536:25410 [212,43,131] 212 [212,43,131] 212 0'0 2015-03-12 13:51:37.777026 0'0 2015-03-11 13:51:35.406246
3.a2f 0 0 0 0 0 0 0 creating 2015-03-20 12:42:15.586372 0'0 0:0 [] -1 [] -1 0'0 0.000000 0'0 0.000000
7.298 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.566966 0'0 261536:900 [187,95,225] 187 [187,95,225] 187 27666'13 2015-03-12 17:59:10.308423 2330'4 2015-03-09 17:55:35.750109
3.a5a 77 87 261 87 623902741 325 325 active+recovering 2015-03-20 10:54:57.443670 33569'325 261536:182464 [150,149,181] 150 [150,149,181] 150 33569'325 2015-03-12 13:58:05.813966 28433'44 2015-03-11 13:57:53.909795
1.1e7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610547 0'0 261536:772 [175,182] 175 [175,182] 175 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09 17:53:49.694822
3.774 79 0 0 0 645136397 339 339 incomplete 2015-03-20 12:19:49.821708 33570'339 261536:166857 [162,39,161] 162 [162,39,161] 162 33570'339 2015-03-12 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950
3.7d0 78 0 0 0 609222686 376 376 incomplete 2015-03-20 12:19:49.534004 33538'376 261536:182810 [117,118,177] 117 [117,118,177] 117 33538'376 2015-03-12 13:51:03.984454 28394'62 2015-03-11 13:50:58.196288
3.d60 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.647196 0'0 261536:833 [154,172,1] 154 [154,172,1] 154 33552'321 2015-03-12 13:44:43.502907 28356'39 2015-03-11 13:44:41.663482
4.1fc 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610103 0'0 261536:1069 [70,179,58] 70 [70,179,58] 70 0'0 2015-03-12 17:58:19.254170 0'0 2015-03-09 17:54:55.720479
3.e02 72 0 0 0 585105425 304 304 incomplete 2015-03-20 12:19:49.564768 33568'304 261536:167428 [15,102,147] 15 [15,102,147] 15 33568'304 2015-03-16 10:04:19.894789 2246'4 2015-03-09 11:43:44.176331
8.1d4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.614727 0'0 261536:19611 [126,43,174] 126 [126,43,174] 126 0'0 2015-03-12 14:34:35.258338 0'0 2015-03-12 14:34:35.258338
4.2f4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.595109 0'0 261536:113791 [181,186,13] 181 [181,186,13] 181 0'0 2015-03-12 14:59:03.529264 0'0 2015-03-09 13:46:40.601301
3.52c 65 23 69 23 543162368 290 290 active+recovering 2015-03-20 10:51:43.664734 33553'290 261536:8431 [212,100,219] 212 [212,100,219] 212 33553'290 2015-03-13 11:44:26.396514 29686'103 2015-03-11 17:18:33.452616
3.e5a 76 70 0 0 623902741 325 325 incomplete 2015-03-20 12:19:49.552071 33569'325 261536:71248 [97,22,62] 97 [97,22,62] 97 33569'325 2015-03-12 13:58:05.813966 28433'44 2015-03-11 13:57:53.909795
8.3a0 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.615728 0'0 261536:173184 [62,14,178] 62 [62,14,178] 62 0'0 2015-03-12 13:52:44.546418 0'0 2015-03-12 13:52:44.546418
3.24e 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.591282 0'0 261536:1026 [103,14,90] 103 [103,14,90] 103 33556'272 2015-03-13 11:44:41.263725 2327'4 2015-03-09 17:54:43.675552
5.f7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.667823 0'0 261536:853 [73,44,123] 73 [73,44,123] 73 0'0 2015-03-12 17:58:30.257371 0'0 2015-03-09 17:55:11.725629
3.ae8 77 67 201 67 624427024 342 342 active+recovering 2015-03-20 10:50:01.693979 33516'342 261536:149258 [122,144,218] 122 [122,144,218] 122 33516'342 2015-03-12 17:11:01.899062 29638'134 2015-03-11 17:10:59.966372
#


PG data is there on multiple OSD’s but Ceph is not recovering the PG , For Example

# ceph pg map 7.25b
osdmap e261536 pg 7.25b (7.25b) -> up [194,145,45] acting [194,145,45]


# ls -l /var/lib/ceph/osd/ceph-194/current/7.25b_head | wc -l
17

# ls -l /var/lib/ceph/osd/ceph-145/current/7.25b_head | wc -l
0
#

# ls -l /var/lib/ceph/osd/ceph-45/current/7.25b_head | wc -l
17





Some of the PG are completely lost , i.e they don’t have any data . For example 

# ceph pg map 10.70
osdmap e261536 pg 10.70 (10.70) -> up [153,140,80] acting [153,140,80]


# ls -l /var/lib/ceph/osd/ceph-140/current/10.70_head | wc -l
0

# ls -l /var/lib/ceph/osd/ceph-153/current/10.70_head | wc -l
0

# ls -l /var/lib/ceph/osd/ceph-80/current/10.70_head | wc -l
0



- Karan -

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux