Re: cosd dying after start

Christian Brunner <chb@xxxxxx> · Wed, 11 Aug 2010 22:07:57 +0200

2010/8/10 Sage Weil <sage@xxxxxxxxxxxx>:
> On Tue, 10 Aug 2010, Christian Brunner wrote:
>
>> After a bit more debuging I've found out that there seems to be a file
>> missing from the filestore:
>>
>> 10.08.10_18:14:07.862190 7f568d3e5710 filestore(/ceph/osd/osd02)
>> getattr /ceph/osd/osd02/current/3.f2_head/rb.0.1d6.00000000000e_head
>> '_' = -2
>
> There was a bug last week in the kernel client rbd branch that was
> improperly encoding osd write operation payloads.  Can you check that your
> rbd client is running 79c49720, which fixes it?

I'm not using the kernel client. - The problem started after a crash
of the cosd. If you want me to, I can try to analyse the coredump.

> That error above was probably just because that object hadn't been written
> yet, and isn't a fatal error.  There is a 'scrub' function that verifies
> that most of the osd metadata is in order and replication is accurate:
> 'ceph osd scrub <osdnum>' and watch 'ceph -w' to see the success or error
> messages go by for each pg (or tail $mon_data/log on any monitor)

I was not able to start the osd, so the scrub didn't run either. Every
time I tried to start it, it died with the message
"terminate called after throwing an instance of
'ceph::buffer::end_of_buffer*'" after 3 seconds.

The only thing that worked was to set up a whole new filesystem.

Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html