Re: pg remapped+peering forever and MDS trimming behind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Op 26 oktober 2016 om 20:44 schreef Brady Deetz <bdeetz@xxxxxxxxx>:
> 
> 
> Summary:
> This is a production CephFS cluster. I had an OSD node crash. The cluster
> rebalanced successfully. I brought the down node back online. Everything
> has rebalanced except 1 hung pg and MDS trimming is now behind. No hardware
> failures have become apparent yet.
> 
> Questions:
> 1) Is there a way to see what pool a placement group belongs to?

The PG's ID always starts with the pool's ID. In your case it's '1'.

# ceph osd dump|grep pool

You will see the pool ID there.

> 2) How should I move forward with unsticking my 1 pg in a constant
> remapped+peering state?
> 

Looking at the PG query have you tried to restart the primary OSD of the PG? And trying to restart the others: [153,162,5]

Which version of Ceph are you running?

> Based on the remapped+peering pg not going away and the mds trimming
> getting further and further behind, I'm guessing that the pg belongs to the
> cephfs metadata pool.
> 

Probably the case indeed. The MDS is blocked by this single PG.

> Any help you can provide is greatly appreciated.
> 
> Details:
> OSD Node Description:
> -2 vlans going over 40gig ethernet for pub/priv nets
> -256 GB RAM
> -2x Xeon 2660v4
> -2x P3700 (journal)
> -24x OSD
> Primary monitor is dedicated similar configuration to OSD
> Primary MDS is dedicated similar configuration to OSD
> 
> [brady@mon0 ~]$ ceph health detail
> HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs
> peering; 1 pgs stuck inactive; 47 requests are blocked > 32 sec; 1 osds
> have slow requests; mds0: Behind on trimming (76/30)
> pg 1.efa is stuck inactive for 174870.396769, current state
> remapped+peering, last acting [153,162,5]
> pg 1.efa is remapped+peering, acting [153,162,5]
> 34 ops are blocked > 268435 sec on osd.153
> 13 ops are blocked > 134218 sec on osd.153
> 1 osds have slow requests
> mds0: Behind on trimming (76/30)(max_segments: 30, num_segments: 76)
> 
> 
> [brady@mon0 ~]$ ceph pg dump_stuck
> ok
> pg_stat state   up      up_primary      acting  acting_primary
> 1.efa   remapped+peering        [153,10,162]    153     [153,162,5]     153
> 
> [brady@mon0 ~]$ ceph pg 1.efa query
> http://pastebin.com/Rz0ZRfSb
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux