Re: two osd stack on peereng after start osd to recovery

Sławomir Skowron <szibis@xxxxxxxxx> · Thu, 13 Jun 2013 15:33:10 +0200

Hi, sorry for late response.

https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view

Logs in attachment, and on google drive, from today.

https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view

We have such problem today. And new logs are on google drive with today date.

Strange is that problematic osd.71 have about 10-15%, more space used
then other osd in cluster.

Today in one hour osd.71 fails 3 times in mon log, and after third
recovery has been stuck, and many 500 errors appears in http layer on
top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,
all from stucked pg, helps, but i run even repair on this osd, just in
case.

I have some theory, that on this pg is rgw index of objects, or one of
osd in this pg, have some problems with local filesystem or drive
bellow (raid controller reports nothing about that), but i do not see
any problem in system.

How can we find in which pg/osd index of objects in rgw bucket exist ??

On Thu, Jun 6, 2013 at 8:21 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> We don't have your logs (vger doesn't forward them). Can you describe
> the situation more completely in terms of what failures occurred and
> what steps you took?
> (Also, this should go on ceph-users. Adding that to the recipients list.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Wed, Jun 5, 2013 at 4:20 PM, Dominik Mostowiec
> <dominikmostowiec@xxxxxxxxx> wrote:
>> hi,
>> I have again the same problem.
>> Have you got any idea?
>>
>> --
>> Regards
>> Dominik
>>
>> 2013/5/23 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
>>> Hi,
>>> I changed disk after failure (osd.155).
>>> When osd.155 start on empty disk, and starts recovery, two osd stack
>>> on peering (108 and 71).
>>> Logs in attachment.
>>> Restart (osd 108,71) helps.
>>> ceph -v
>>> ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
>>>
>>> setup:
>>>
>>> 6 servers x 26 osd
>>> 6 x mons
>>> journal and data on the same disk
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
-----
Pozdrawiam

Sławek "sZiBis" Skowron
<<attachment: logs.zip>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com