[Ceph-community] Pgs are in stale+down+peering state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is osd.12  doing anything strange?  Is it consuming lots of CPU or IO?  Is
it flapping?   Writing any interesting logs?  Have you tried restarting it?

If that doesn't help, try the other involved osds: 56, 27, 6, 25, 23.  I
doubt that it will help, but it won't hurt.



On Mon, Sep 22, 2014 at 11:21 AM, Varada Kari <Varada.Kari at sandisk.com>
wrote:

>  Hi Sage,
>
>
>
> To give more context on this problem,
>
>
>
> This cluster has two pools rbd and user-created.
>
>
>
> Osd.12 is a primary for some other PG?s , but the problem happens for
> these three  PG?s.
>
>
>
> $ sudo ceph osd lspools
>
> 0 rbd,2 pool1,
>
>
>
> $ sudo ceph -s
>
>     cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
>
>      health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs
> stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are
> blocked > 32 sec
>
>     monmap e1: 3 mons at {rack2-ram-1=
> 10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
> election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
>
>      osdmap e17842: 64 osds: 64 up, 64 in
>
>       pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
>
>             12504 GB used, 10971 GB / 23476 GB avail
>
>                 2145 active+clean
>
>                    3 stale+down+peering
>
>
>
> Snippet from pg dump:
>
>
>
> 2.a9    518     0       0       0       0       2172649472      3001
> 3001    active+clean    2014-09-22 17:49:35.357586      6826'35762
> 17842:72706     [12,7,28]       12      [12,7,28]   12
> 6826'35762      2014-09-22 11:33:55.985449      0'0     2014-09-16
> 20:11:32.693864
>
> 0.59    0       0       0       0       0       0       0       0
> active+clean    2014-09-22 17:50:00.751218      0'0     17842:4472
> [12,41,2]       12      [12,41,2]       12      0'0 2014-09-22
> 16:47:09.315499       0'0     2014-09-16 12:20:48.618726
>
> 0.4d    0       0       0       0       0       0       4       4
> stale+down+peering      2014-09-18 17:51:10.038247      186'4
> 11134:498       [12,56,27]      12      [12,56,27]      12  186'4
> 2014-09-18 17:30:32.393188      0'0     2014-09-16 12:20:48.615322
>
> 0.49    0       0       0       0       0       0       0       0
> stale+down+peering      2014-09-18 17:44:52.681513      0'0
> 11134:498       [12,6,25]       12      [12,6,25]       12  0'0
>      2014-09-18 17:16:12.986658      0'0     2014-09-16 12:20:48.614192
>
> 0.1c    0       0       0       0       0       0       12      12
> stale+down+peering      2014-09-18 17:51:16.735549      186'12
> 11134:522       [12,25,23]      12      [12,25,23]      12  186'12
> 2014-09-18 17:16:04.457863      186'10  2014-09-16 14:23:58.731465
>
> 2.17    510     0       0       0       0       2139095040      3001
> 3001    active+clean    2014-09-22 17:52:20.364754      6784'30742
> 17842:72033     [12,27,23]      12      [12,27,23]  12
> 6784'30742      2014-09-22 00:19:39.905291      0'0     2014-09-16
> 20:11:17.016299
>
> 2.7e8   508     0       0       0       0       2130706432      3433
> 3433    active+clean    2014-09-22 17:52:20.365083      6702'21132
> 17842:64769     [12,25,23]      12      [12,25,23]  12
> 6702'21132      2014-09-22 17:01:20.546126      0'0     2014-09-16
> 14:42:32.079187
>
> 2.6a5   528     0       0       0       0       2214592512      2840
> 2840    active+clean    2014-09-22 22:50:38.092084      6775'34416
> 17842:83221     [12,58,0]       12      [12,58,0]   12
> 6775'34416      2014-09-22 22:50:38.091989      0'0     2014-09-16
> 20:11:32.703368
>
>
>
> And we couldn?t observe and peering events happening on the primary osd.
>
>
>
> $ sudo ceph pg 0.49 query
>
> Error ENOENT: i don't have pgid 0.49
>
> $ sudo ceph pg 0.4d query
>
> Error ENOENT: i don't have pgid 0.4d
>
> $ sudo ceph pg 0.1c query
>
> Error ENOENT: i don't have pgid 0.1c
>
>
>
> Not able to explain why the peering was stuck. BTW, Rbd pool doesn?t
> contain any data.
>
>
>
> Varada
>
>
>
> *From:* Ceph-community [mailto:ceph-community-bounces at lists.ceph.com] *On
> Behalf Of *Sage Weil
> *Sent:* Monday, September 22, 2014 10:44 PM
> *To:* Sahana Lokeshappa; ceph-users at lists.ceph.com; ceph-users at ceph.com;
> ceph-community at lists.ceph.com
> *Subject:* Re: [Ceph-community] Pgs are in stale+down+peering state
>
>
>
> Stale means that the primary OSD for the PG went down and the status is
> stale.  They all seem to be from OSD.12... Seems like something is
> preventing that OSD from reporting to the mon?
>
> sage
>
>
>
> On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa <
> Sahana.Lokeshappa at sandisk.com> wrote:
>
> Hi all,
>
>
>
> I used command  ?ceph osd thrash ? command and after all osds are up and
> in, 3  pgs are in  stale+down+peering state
>
>
>
> sudo ceph -s
>
>     cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
>
>      health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs
> stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean
>
>      monmap e1: 3 mons at {rack2-ram-1=
> 10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0},
> election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
>
>      osdmap e17031: 64 osds: 64 up, 64 in
>
>       pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
>
>             12501 GB used, 10975 GB / 23476 GB avail
>
>                 2145 active+clean
>
>                    3 stale+down+peering
>
>
>
> sudo ceph health detail
>
> HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck inactive;
> 3 pgs stuck stale; 3 pgs stuck unclean
>
> pg 0.4d is stuck inactive for 341048.948643, current state
> stale+down+peering, last acting [12,56,27]
>
> pg 0.49 is stuck inactive for 341048.948667, current state
> stale+down+peering, last acting [12,6,25]
>
> pg 0.1c is stuck inactive for 341048.949362, current state
> stale+down+peering, last acting [12,25,23]
>
> pg 0.4d is stuck unclean for 341048.948665, current state
> stale+down+peering, last acting [12,56,27]
>
> pg 0.49 is stuck unclean for 341048.948687, current state
> stale+down+peering, last acting [12,6,25]
>
> pg 0.1c is stuck unclean for 341048.949382, current state
> stale+down+peering, last acting [12,25,23]
>
> pg 0.4d is stuck stale for 339823.956929, current state
> stale+down+peering, last acting [12,56,27]
>
> pg 0.49 is stuck stale for 339823.956930, current state
> stale+down+peering, last acting [12,6,25]
>
> pg 0.1c is stuck stale for 339823.956925, current state
> stale+down+peering, last acting [12,25,23]
>
>
>
>
>
> Please, can anyone explain why pgs are in this state.
>
>
>
> *Sahana Lokeshappa Test Development Engineer I **SanDisk Corporation*
> 3rd Floor, Bagmane Laurel, Bagmane Tech Park
>
> C V Raman nagar, Bangalore 560093
> T: +918042422283
>
> Sahana.Lokeshappa at SanDisk.com
>
>
>
>
>  ------------------------------
>
>
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
>
> ------------------------------
>
>
> Ceph-community mailing list
> Ceph-community at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>
>
> --
> Sent from Kaiten Mail. Please excuse my brevity.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140923/02382c40/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux