Re: best way to resolve 'stale+active+clean' after disk failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the suggestions. There turned out to be an old testing pool with replication of 1 that was causing the issue. Removing the pool fixed the issue.


On 04/06/2017 07:34 PM, Brad Hubbard wrote:
What are size and min_size for pool '7'... and why?

On Fri, Apr 7, 2017 at 4:20 AM, David Welch <dwelch@xxxxxxxxxxxx> wrote:
Hi,
We had a disk on the cluster that was not responding properly and causing
'slow requests'. The osd on the disk was stopped and the osd was marked down
and then out. Rebalancing succeeded but (some?) pgs from that osd are now
stuck in stale+active+clean state, which is not being resolved (see below
for query results).

My question: is it better to mark this osd as "lost" (i.e. 'ceph osd lost
14') or to remove the osd as detailed here:
https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

Thanks,
David


$ ceph health detail
HEALTH_ERR 17 pgs are stuck inactive for more than 300 seconds; 17 pgs
stale; 17 pgs stuck stale
pg 7.f3 is stuck stale for 6138.330316, current state stale+active+clean,
last acting [14]
pg 7.bd is stuck stale for 6138.330365, current state stale+active+clean,
last acting [14]
pg 7.b6 is stuck stale for 6138.330374, current state stale+active+clean,
last acting [14]
pg 7.c5 is stuck stale for 6138.330363, current state stale+active+clean,
last acting [14]
pg 7.ac is stuck stale for 6138.330385, current state stale+active+clean,
last acting [14]
pg 7.5b is stuck stale for 6138.330678, current state stale+active+clean,
last acting [14]
pg 7.1b4 is stuck stale for 6138.330409, current state stale+active+clean,
last acting [14]
pg 7.182 is stuck stale for 6138.330445, current state stale+active+clean,
last acting [14]
pg 7.1f8 is stuck stale for 6138.330720, current state stale+active+clean,
last acting [14]
pg 7.53 is stuck stale for 6138.330697, current state stale+active+clean,
last acting [14]
pg 7.1d2 is stuck stale for 6138.330663, current state stale+active+clean,
last acting [14]
pg 7.70 is stuck stale for 6138.330742, current state stale+active+clean,
last acting [14]
pg 7.14f is stuck stale for 6138.330585, current state stale+active+clean,
last acting [14]
pg 7.23 is stuck stale for 6138.330610, current state stale+active+clean,
last acting [14]
pg 7.153 is stuck stale for 6138.330600, current state stale+active+clean,
last acting [14]
pg 7.cc is stuck stale for 6138.330409, current state stale+active+clean,
last acting [14]
pg 7.16b is stuck stale for 6138.330509, current state stale+active+clean,
last acting [14]
$ ceph pg dump_stuck stale
ok
pg_stat    state    up    up_primary    acting    acting_primary
7.f3    stale+active+clean    [14]    14    [14]    14
7.bd    stale+active+clean    [14]    14    [14]    14
7.b6    stale+active+clean    [14]    14    [14]    14
7.c5    stale+active+clean    [14]    14    [14]    14
7.ac    stale+active+clean    [14]    14    [14]    14
7.5b    stale+active+clean    [14]    14    [14]    14
7.1b4    stale+active+clean    [14]    14    [14]    14
7.182    stale+active+clean    [14]    14    [14]    14
7.1f8    stale+active+clean    [14]    14    [14]    14
7.53    stale+active+clean    [14]    14    [14]    14
7.1d2    stale+active+clean    [14]    14    [14]    14
7.70    stale+active+clean    [14]    14    [14]    14
7.14f    stale+active+clean    [14]    14    [14]    14
7.23    stale+active+clean    [14]    14    [14]    14
7.153    stale+active+clean    [14]    14    [14]    14
7.cc    stale+active+clean    [14]    14    [14]    14
7.16b    stale+active+clean    [14]    14    [14]    14



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
~~~~~~
David Welch
DevOps
ARS
http://thinkars.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux