Re: Cluster crash - FAILED assert(interval.last > last)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
does anyone suggest what to do with this ? I have identified the
underlying crashing code src/osd/osd_types.cc [assert(interval.last >
last);] commited by Sage Weil, however didnt figured out exact mechanism
of function and why it crashes. Also unclear is mechanism, how this bug
spreaded and crashed so many healtly OSDs which are unable to start now.
This seems pretty serious issue, as it can take down large numbers of
OSDs without sweat. What can we do here now ?
Thanks
Zdenek Janda


On 11.1.2018 10:48, Josef Zelenka wrote:
> I have posted logs/strace from our osds with details to a ticket in the
> ceph bug tracker - see here http://tracker.ceph.com/issues/21142. You
> can see where exactly the OSDs crash etc, this can be of help if someone
> decides to debug it.
> 
> JZ
> 
> 
> On 10/01/18 22:05, Josef Zelenka wrote:
>>
>> Hi, today we had a disasterous crash - we are running a 3 node, 24 osd
>> in total cluster (8 each) with SSDs for blockdb, HDD for bluestore
>> data. This cluster is used as a radosgw backend, for storing a big
>> number of thumbnails for a file hosting site - around 110m files in
>> total. We were adding an interface to the nodes which required a
>> restart, but after restarting one of the nodes, a lot of the OSDs were
>> kicked out of the cluster and rgw stopped working. We have a lot of
>> pgs down and unfound atm. OSDs can't be started(aside from some,
>> that's a mystery) with this error -  FAILED assert ( interval.last >
>> last) - they just periodically restart. So far, the cluster is broken
>> and we can't seem to bring it back up. We tried fscking the osds via
>> the ceph objectstore tool, but it was no good. The root of all this
>> seems to be in the FAILED assert(interval.last > last) error, however
>> i can't find any info regarding this or how to fix it. Did someone
>> here also encounter it? We're running luminous on ubuntu 16.04.
>>
>> Thanks
>>
>> Josef Zelenka
>>
>> Cloudevelops
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux