Re: Large directory block size on XFS may be harmful

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 18 Feb 2016 16:08:26 +0100

On Thu, Feb 18, 2016 at 3:46 PM, Jens Rosenboom <j.rosenboom@xxxxxxxx> wrote:
> 2016-02-18 15:10 GMT+01:00 Dan van der Ster <dan@xxxxxxxxxxxxxx>:
>> Hi,
>>
>> Thanks for linking to a current update on this problem [1] [2]. I
>> really hope that new Ceph installations aren't still following that
>> old advice... it's been known to be a problem for around a year and a
>> half [3].
>> That said, the "-n size=64k" wisdom was really prevalent a few years
>> ago, and I wonder how many old clusters are at risk today.
>
> Thanks for listing some more references, strange thing that I couldn't
> find these when I was looking into this issue a couple of weeks ago.
>
> Also wisdom seems to be spreading only slowly even inside the Ceph
> developer community, as e.g. cbt still used this setting until
> recently [5], which together with some other references I found made
> me use this as a default until last week.

Whoa good catch!

>> I manage a
>> sufficiently large enough number of affected OSDs that I'll be willing
>> to try all other possibilities before reformatting them [4]. Today
>> they're rock solid stable on EL6 (with hammer), but the jewel release
>> is getting closer and that's when we'll need to upgrade to EL7. (I've
>> already upgraded one host to 7 and haven't seen any problems yet, but
>> that one sample doesn't offer much comfort for the rest.) Anyway, it's
>> great to hear that there's a patch in the works... Dave deserves
>> infinite thanks if this gets resolved.
>
> As a non-subscriber, [4] wasn't that useful to me, but from your
> comments I take it that the recommended solution also is to reformat.

Subscriber or not, everything I've read up to now suggests that
reformatting is the only solution. There was a partial fix mentioned
in [1], which I've confirmed is present in the EL7 kernels. But since
I'm not able to reproduce the problem, I wasn't sure if that patch
fixed it, or... Anyway, your thread shows we're all still at risk.
Thanks!

> According to my tests up to now, even when the new patch eventually
> makes it into the kernel, it will only reduce the impact of the issue,
> but not completely resolve it. Memory may still get fragmented enough
> in order for the needed allocations to fail for some time, so although
> it will not stall operations completely, there will likely still be a
> performance impact. So in the end, reformatting may still be the
> safest solution.

BTW, we run OSD servers with vm.min_free_kbytes=1048576 -- this was
some other old wisdom intended to make fragmented memory less likely.
I have no idea if that is still good advice, but maybe... try it?

-- Dan

>> [1] http://tracker.ceph.com/issues/6301
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1278992
>> [3] https://github.com/redhat-cip/puppet-ceph/commit/b9407efd4a8a25d452e493fb48ea048e4d36e070
>> [4] https://access.redhat.com/solutions/1597523
>
> [5] https://github.com/ceph/cbt/pull/85
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com