On Wed, 24 Sep 2014, Sahana Lokeshappa wrote: > 2.a9??? 518???? 0?????? 0?????? 0?????? 0?????? 2172649472????? 3001??? > 3001??? active+clean??? 2014-09-22 17:49:35.357586????? 6826'35762????? > 17842:72706???? [12,7,28]?????? 12????? [12,7,28]?? 12?????? 6826'35762????? > 2014-09-22 11:33:55.985449????? 0'0???? 2014-09-16 20:11:32.693864 Can you verify that 2.a9 exists in teh data directory for 12, 7, and/or 28? If so the next step would be to enable logging (debug osd = 20, debug ms = 1) and see wy peering is stuck... sage > > 0.59??? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? > active+clean??? 2014-09-22 17:50:00.751218????? 0'0???? 17842:4472????? > [12,41,2]?????? 12????? [12,41,2]?????? 12????? 0'0 2014-09-22 > 16:47:09.315499? ?????0'0???? 2014-09-16 12:20:48.618726 > > 0.4d??? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 4?????? 4?????? > stale+down+peering????? 2014-09-18 17:51:10.038247????? 186'4?? > 11134:498?????? [12,56,27]????? 12????? [12,56,27]????? 12? 186'4??? > 2014-09-18 17:30:32.393188????? 0'0???? 2014-09-16 12:20:48.615322 > > 0.49??? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? > stale+down+peering????? 2014-09-18 17:44:52.681513????? 0'0???? > 11134:498?????? [12,6,25]?????? 12????? [12,6,25]?????? 12? 0'0 > ?????2014-09-18 17:16:12.986658????? 0'0???? 2014-09-16 12:20:48.614192 > > 0.1c??? 0?????? 0?????? 0?????? 0?????? 0?????? 0?????? 12????? 12????? > stale+down+peering????? 2014-09-18 17:51:16.735549????? 186'12? > 11134:522?????? [12,25,23]????? 12????? [12,25,23]????? 12? 186'12?? > 2014-09-18 17:16:04.457863????? 186'10? 2014-09-16 14:23:58.731465 > > 2.17??? 510???? 0?????? 0?????? 0?????? 0?????? 2139095040????? 3001??? > 3001??? active+clean??? 2014-09-22 17:52:20.364754????? 6784'30742????? > 17842:72033???? [12,27,23]????? 12????? [12,27,23]? 12?????? 6784'30742????? > 2014-09-22 00:19:39.905291????? 0'0???? 2014-09-16 20:11:17.016299 > > 2.7e8?? 508???? 0?????? 0?????? 0?????? 0?????? 2130706432????? 3433??? > 3433??? active+clean??? 2014-09-22 17:52:20.365083????? 6702'21132????? > 17842:64769???? [12,25,23]????? 12????? [12,25,23]? 12?????? 6702'21132????? > 2014-09-22 17:01:20.546126????? 0'0???? 2014-09-16 14:42:32.079187 > > 2.6a5?? 528???? 0?????? 0?????? 0?????? 0?????? 2214592512????? 2840??? > 2840??? active+clean??? 2014-09-22 22:50:38.092084????? 6775'34416????? > 17842:83221???? [12,58,0]?????? 12????? [12,58,0]?? 12?????? 6775'34416????? > 2014-09-22 22:50:38.091989????? 0'0???? 2014-09-16 20:11:32.703368 > > ? > > And we couldn?t observe and peering events happening on the primary osd. > > ? > > $ sudo ceph pg 0.49 query > > Error ENOENT: i don't have pgid 0.49 > > $ sudo ceph pg 0.4d query > > Error ENOENT: i don't have pgid 0.4d > > $ sudo ceph pg 0.1c query > > Error ENOENT: i don't have pgid 0.1c > > ? > > Not able to explain why the peering was stuck. BTW, Rbd pool doesn?t contain > any data. > > ? > > Varada > > ? > > From: Ceph-community [mailto:ceph-community-bounces at lists.ceph.com] On > Behalf Of Sage Weil > Sent: Monday, September 22, 2014 10:44 PM > To: Sahana Lokeshappa; ceph-users at lists.ceph.com; ceph-users at ceph.com; > ceph-community at lists.ceph.com > Subject: Re: [Ceph-community] Pgs are in stale+down+peering state > > ? > > Stale means that the primary OSD for the PG went down and the status is > stale.? They all seem to be from OSD.12... Seems like something is > preventing that OSD from reporting to the mon? > > sage > > ? > > On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa > <Sahana.Lokeshappa at sandisk.com> wrote: > > Hi all, > > ? > > I used command ??ceph osd thrash ? command and after all osds are up > and in, 3 ?pgs are in ?stale+down+peering state > > ? > > sudo ceph -s > > ????cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758 > > ???? health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; > 3 pgs stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean > > ???? monmap e1: 3 mons at{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ra > m-3=10.242.42.188:6789/0}, election epoch 2008, quorum 0,1,2 > rack2-ram-1,rack2-ram-2,rack2-ram-3 > > ???? osdmap e17031: 64 osds: 64 up, 64 in > > ????? pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033 > kobjects > > ??????????? 12501 GB used, 10975 GB / 23476 GB avail > > ??????????????? 2145 active+clean > > ?????????????????? 3 stale+down+peering > > ? > > sudo ceph health detail > > HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck > inactive; 3 pgs stuck stale; 3 pgs stuck unclean > > pg 0.4d is stuck inactive for 341048.948643, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck inactive for 341048.948667, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck inactive for 341048.949362, current state > stale+down+peering, last acting [12,25,23] > > pg 0.4d is stuck unclean for 341048.948665, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck unclean for 341048.948687, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck unclean for 341048.949382, current state > stale+down+peering, last acting [12,25,23] > > pg 0.4d is stuck stale for 339823.956929, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck stale for 339823.956930, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck stale for 339823.956925, current state > stale+down+peering, last acting [12,25,23] > > ? > > ? > > Please, can anyone explain why pgs are in this state. > > Sahana Lokeshappa > Test Development Engineer I > SanDisk Corporation > 3rd Floor, Bagmane Laurel, Bagmane Tech Park > > C V Raman nagar, Bangalore 560093 > T: +918042422283 > > Sahana.Lokeshappa at SanDisk.com > > ? > > ? > > > ____________________________________________________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail > message is intended only for the use of the designated > recipient(s) named above. If the reader of this message is not > the intended recipient, you are hereby notified that you have > received this message in error and that any review, > dissemination, distribution, or copying of this message is > strictly prohibited. If you have received this communication in > error, please notify the sender by telephone or e-mail (as shown > above) immediately and destroy any and all copies of this > message in your possession (whether hard copies or > electronically stored copies). > > ____________________________________________________________________________ > > Ceph-community mailing list > Ceph-community at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com > > > -- > Sent from Kaiten Mail. Please excuse my brevity. > > >