Re: offending shards are crashing osd's

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Fri, 21 Oct 2016 10:35:42 +0200

On 19. okt. 2016 13:00, Ronny Aasen wrote:
On 06. okt. 2016 13:41, Ronny Aasen wrote:
hello

I have a few osd's in my cluster that are regularly crashing.

[snip]

ofcourse having 3 osd's dying regularly is not good for my health. so i
have set noout, to avoid heavy recoveries.

googeling this error messages gives exactly 1 hit:
https://github.com/ceph/ceph/pull/6946

where it saies:  "the shard must be removed so it can be reconstructed"
but with my 3 osd's failing, i am not certain witch of them contain the
broken shard. (or perhaps all 3 of them?)

a bit reluctant to delete on all 3. I have 4+2 erasure coding.
( erasure size 6 min_size 4 ) so finding out witch one is bad would be
nice.

hope someone have an idea how to progress.

kind regards
Ronny Aasen

i again have this problem with crashing osd's. a more detailed log is on
the tail of this mail.

Does anyone have any suggestions on how i can identify what shard that
needs to be removed to allow the EC to recover. ?

and more importantly how i can stop the osd's from crashing?

kind regards
Ronny Aasen

Answering my own question for googleabillity.

using this one liner.

for dir in $(find /var/lib/ceph/osd/ceph-* -maxdepth 2  -type d -name 
'5.26*' | sort | uniq) ; do find $dir -name 
'*3a3938238e1f29.00000000002d80ca*' -type f -ls ;done

i got a list of all shards of the problematic object.
One of the object had size 0 but was otherways readable without any io 
errors. I guess this explains the inconsistent size, but it does not 
explain why ceph decides it's better to crash 3 osd's, rather then move 
a 0 byte file into a "LOST+FOUND" style directory structure.
Or just delete it, since it will not have any useful data anyway.

Deleting this file (mv to /tmp). allowed the 3 broken osd's to start, 
and have been running for >24h now. while usualy they crash within 10 
minutes. Yay!

Generally you need to check _all_ shards on the given pg. Not just the 3 
crashing. This was what confused me since i only focused on the crashing 
osd's

I used the oneliner that checked osd's for the pg since due to 
backfilling the pg was spread all over the place. And i could run it 
from ansible to reduce tedious work.

Also it would be convinient to be able to mark a broken/inconsistent pg 
manually "inactive". Instead of crashing 3 osd's and taking lots of 
other pg's with them down. One could set the pg inactive while 
troubleshooting, and unset pg-inactive when done. without having osd's 
crash and all the following high load rebalancing.

Also i ran a find for 0 size files on that pg and there are multiple 
other files.  are a 0 byte rbd_data file on a pg a normal occurence, or 
can i have more similar problems in the future due to the other 0 size 
files ?

kind regards
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com