Re: Segmentation faults in ceph-osd

Emil Renner Berthing <ceph@xxxxxxxx> · Mon, 3 Jun 2013 17:42:00 +0200

Hi,

Unfortunately we keep getting these segmentation faults even when all
the cluster contains is objects from the rados benchmark tool.

I've opened an issue for it here:
http://tracker.ceph.com/issues/5239

On 21 May 2013 23:27, Anders Saaby <anders@xxxxxxxxx> wrote:
> On 21/05/2013, at 21.00, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>> How large are the xattrs?
>> -Sam
>
> Sam,
>
> They are quite small, as far as I can see from 10-50 chars each.
>
> However, as you just explained on irc, our current unbounded 1:1 files vs. objects aproach, which in rare cases leads to ~25GB large objects, is not a good idea.. Could those large objects be what triggers the large leveldb allocation (and eventuelly the segfault)?
>
>
> --
> Anders
>
>> On Tue, May 21, 2013 at 10:13 AM, Emil Renner Berthing <ceph@xxxxxxxx> wrote:
>>> On 21 May 2013 19:05, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
>>>> Do you use xattrs at all?
>>>
>>> Yes, on each object we set between 2 to 4 attributes at write time
>>> which are then left unchanged.
>>> /Emil
>>>
>>>> -Sam
>>>>
>>>> On Tue, May 21, 2013 at 9:34 AM, Anders Saaby <anders@xxxxxxxxx> wrote:
>>>>> On 21/05/2013, at 18.19, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>>>> On Tue, May 21, 2013 at 9:01 AM, Emil Renner Berthing <ceph@xxxxxxxx> wrote:
>>>>>>> On 21 May 2013 17:55, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>>>>>>> On Tue, May 21, 2013 at 8:44 AM, Emil Renner Berthing <ceph@xxxxxxxx> wrote:
>>>>>>>>> Hi Greg,
>>>>>>>>>
>>>>>>>>> Here are some more stats on our servers:
>>>>>>>>> - each server has 64GB ram,
>>>>>>>>> - there are 12 OSDs pr. server,
>>>>>>>>> - each OSD uses around 1.5 GB of memory,
>>>>>>>>> - we have 18432 PGs,
>>>>>>>>> - around 5 to 10 MB writes/s is written to each OSD and almost no reads (yet).
>>>>>>>>
>>>>>>>> What interface are you writing with? How many OSD servers are there?
>>>>>>>
>>>>>>> We're using librados and there are 132 OSDs so far.
>>>>>>
>>>>>> Okay, so the allocation is happening in the depths of LevelDB — maybe
>>>>>> the issue is there somewhere. Are you doing anything weird with omap,
>>>>>> snapshots, or xattrs?
>>>>>
>>>>> I can help; No, we are not using omap, snaps or weird stuff with xattrs.
>>>>>
>>>>> We are storing objects of sizes from few KB to GBs. Also, we have a quirk in the application design right now, which means that we store an object, write it again under a new name and delete the original object. - if that is of any potential value to finding this.
>>>>>
>>>>>
>>>>> --
>>>>> Anders
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html