Re: OSD commits suicide

Andrey Korolyov <andrey@xxxxxxx> · Tue, 18 Nov 2014 23:19:47 +0400

On Tue, Nov 18, 2014 at 10:04 PM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx> wrote:
> That would probably have helped.  The XFS deadlocks would only occur when
> there was relatively little free memory.  Kernel 3.18 is supposed to have a
> fix for that, but I haven't tried it yet.
>
> Looking at my actual usage, I don't even need 64k inodes.  64k inodes should
> make things a bit faster when you have a large number of files in a
> directory.  Ceph will automatically split directories with too many files
> into multiple sub-directories, so it's kinda pointless.
>
> I may try the experiment again, but probably not.  It took several weeks to
> reformat all of the OSDS.  Even on a single node, it takes 4-5 days to
> drain, format, and backfill.  That was months ago, and I'm still dealing
> with the side effects.  I'm not eager to try again.
>
>
> On Mon, Nov 17, 2014 at 2:04 PM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>>
>> On Tue, Nov 18, 2014 at 12:54 AM, Craig Lewis <clewis@xxxxxxxxxxxxxxxxxx>
>> wrote:
>> > I did have a problem in my secondary cluster that sounds similar to
>> > yours.
>> > I was using XFS, and traced my problem back to 64 kB inodes (osd mkfs
>> > options xfs = -i size=64k).   This showed up with a lot of "XFS:
>> > possible
>> > memory allocation deadlock in kmem_alloc" in the kernel logs.  I was
>> > able to
>> > keep things limping along by flushing the cache frequently, but I
>> > eventually
>> > re-formatted every OSD to get rid of the 64k inodes.
>> >
>> > After I finished the reformat, I had problems because of deep-scrubbing.
>> > While reformatting, I disabled deep-scrubbing.  Once I re-enabled it,
>> > Ceph
>> > wanted to deep-scrub the whole cluster, and sometimes 90% of my OSDs
>> > would
>> > be doing a deep-scrub.  I'm manually deep-scrubbing now, trying to
>> > spread
>> > out the schedule a bit.  Once this finishes in a few day, I should be
>> > able
>> > to re-enable deep-scrubbing and keep my HEALTH_OK.
>> >
>> >
>>
>> Would you mind to check suggestions by following mine hints or hints
>> from mentioned URLs from there
>> http://marc.info/?l=linux-mm&m=141607712831090&w=2 with 64k again? As
>> for me, I am not observing lock loop after setting min_free_kbytes for
>> a half of gigabyte per OSD. Even if your locks has a different nature,
>> it may be worthy to try anyway.
>
>

Thanks, I perfectly understand this. But, if you have low enough
OSD/node ratio, it can be possible to check the problem at a node
scale. By the way, I do not see real reasons behind using lower
allocsize NOT for object storage-designed cluster.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com