Re: problems with pg down

Fabio Abreu <fabioabreureis@xxxxxxxxx> · Sun, 10 Mar 2019 15:26:57 -0300

Hi Darius, 

Thanks for your reply ! 

This happening after a disaster with an sata storage node, the osds 102 and 121 is up  . 

The information belllow is osd 14 log , do you recommend mark out of this cluster ? 

2019-03-10 17:36:17.654134 7f1991163700  0 -- 172.16.184.90:6800/589935 >> :/0 pipe(0x555be7808800 sd=516 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6720400).accept failed to getpeername (107) Transport endpoint is not connected
2019-03-10 17:36:17.654660 7f1992d7f700  0 -- 172.16.184.90:6800/589935 >> :/0 pipe(0x555be773f400 sd=536 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6720700).accept failed to getpeername (107) Transport endpoint is not connected
2019-03-10 17:36:17.654720 7f1993a8c700  0 -- 172.16.184.90:6800/589935 >> 172.16.184.92:6801/1555502 pipe(0x555be7807400 sd=542 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6720280).accept connect_seq 0 vs existing 0 state wait
2019-03-10 17:36:17.654813 7f199095b700  0 -- 172.16.184.90:6800/589935 >> :/0 pipe(0x555be6d8e000 sd=537 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be671ff80).accept failed to getpeername (107) Transport endpoint is not connected
2019-03-10 17:36:17.654847 7f1992476700  0 -- 172.16.184.90:6800/589935 >> 172.16.184.95:6840/1537112 pipe(0x555be773e000 sd=533 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be671fc80).accept connect_seq 0 vs existing 0 state wait
2019-03-10 17:36:17.655252 7f1993486700  0 -- 172.16.184.90:6800/589935 >> 172.16.184.92:6832/1098862 pipe(0x555be779f400 sd=521 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6242d00).accept connect_seq 0 vs existing 0 state wait
2019-03-10 17:36:17.655315 7f1993284700  0 -- 172.16.184.90:6800/589935 >> :/0 pipe(0x555be6d90800 sd=523 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6720880).accept failed to getpeername (107) Transport endpoint is not connected
2019-03-10 17:36:17.655814 7f1992173700  0 -- 172.16.184.90:6800/589935 >> 172.16.184.91:6833/316673 pipe(0x555be7740800 sd=527 :6800 s=0 pgs=0 cs=0 l=0 c=0x555be6720580).accept connect_seq 0 vs existing 0 state wait

Regards, 
Fabio Abreu 

On Sun, Mar 10, 2019 at 3:20 PM Darius Kasparavičius <daznis@xxxxxxxxx> wrote:
Hi,

Check your osd.14 logs for information its currently stuck and not

providing io for replication. And what happened to OSD's 102 121?

On Sun, Mar 10, 2019 at 7:44 PM Fabio Abreu <fabioabreureis@xxxxxxxxx> wrote:

>

> Hi Everybody .

>

> I have an pg with down+peering  state and that have requests blocked impacting my pg query, I can't find the osd to apply the lost paremeter.

>

> http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure

>

> Did someone  have  same  scenario with  state down?

>

> Storage :

>

> 100 ops are blocked > 262.144 sec on osd.14

>

> root@monitor:~# ceph pg dump_stuck inactive

> ok

> pg_stat state   up      up_primary      acting  acting_primary

> 5.6e0   down+remapped+peering   [102,121,14]    102     [14]    14

>

>

> root@monitor:~# ceph -s

>     cluster xxx

>      health HEALTH_ERR

>             1 pgs are stuck inactive for more than 300 seconds

>             223 pgs backfill_wait

>             14 pgs backfilling

>             215 pgs degraded

>             1 pgs down

>             1 pgs peering

>             1 pgs recovering

>             53 pgs recovery_wait

>             199 pgs stuck degraded

>             1 pgs stuck inactive

>             278 pgs stuck unclean

>             162 pgs stuck undersized

>             162 pgs undersized

>             100 requests are blocked > 32 sec

>             recovery 2767660/317878237 objects degraded (0.871%)

>             recovery 7484106/317878237 objects misplaced (2.354%)

>             recovery 29/105009626 unfoun

>

>

>

>

> --

> Regards,

> Fabio Abreu Reis

> http://fajlinux.com.br

> Tel : +55 21 98244-0161

> Skype : fabioabreureis

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Atenciosamente, 
Fabio Abreu Reis
http://fajlinux.com.br
Tel : +55 21 98244-0161
Skype : fabioabreureis
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com