Re: Large directory block size on XFS may be harmful

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 18 Feb 2016 15:10:51 +0100

Hi,

Thanks for linking to a current update on this problem [1] [2]. I
really hope that new Ceph installations aren't still following that
old advice... it's been known to be a problem for around a year and a
half [3].

That said, the "-n size=64k" wisdom was really prevalent a few years
ago, and I wonder how many old clusters are at risk today. I manage a
sufficiently large enough number of affected OSDs that I'll be willing
to try all other possibilities before reformatting them [4]. Today
they're rock solid stable on EL6 (with hammer), but the jewel release
is getting closer and that's when we'll need to upgrade to EL7. (I've
already upgraded one host to 7 and haven't seen any problems yet, but
that one sample doesn't offer much comfort for the rest.) Anyway, it's
great to hear that there's a patch in the works... Dave deserves
infinite thanks if this gets resolved.

Cheers, Dan

[1] http://tracker.ceph.com/issues/6301
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1278992
[3] https://github.com/redhat-cip/puppet-ceph/commit/b9407efd4a8a25d452e493fb48ea048e4d36e070
[4] https://access.redhat.com/solutions/1597523

On Thu, Feb 18, 2016 at 1:14 PM, Jens Rosenboom <j.rosenboom@xxxxxxxx> wrote:
> Various people have noticed performance problems and sporadic kernel
> log messages like
>
> kernel: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)
>
> with their Ceph clusters. We have seen this in one of our clusters
> ourselves, but not been able to reproduce it in a lab environment
> until recently. While trying to setup a benchmark for comparing the
> effect of varying the bucket shard count, I suddenly started seeing
> the same issues again and they seemed to be reproducible even with the
> latest upstream kernel.
>
> The test setup comprised 8 nodes with 2 SSDs as OSDs each. The
> messages started to appear after writing 16kb sized objects with
> cosbench using 32 workers for about 2 hours and soon after that the
> OSDs started dying because of suicide timeout.
>
> So we went ahead and tried running a kernel patched with [1], but this
> had only partial success, so I posted these results to the XFS mailing
> list. The response by Dave Chinner led to some important result:
> Creating the file system with the option "-n size=64k" was the
> culprit. Repeating the tests with sizes <=16k did not show any issues
> and the performance for this particular test even turned out to be
> better with simply letting the directory block size stay at the
> default value of 4k.
>
> In case you are seeing similar issues, you may want to check the
> directory block size of your file system, you can use xfs_info for
> that. The bad news is that the parameter cannot be changed for an
> existing file system, so you will need to reformat everything.
>
> And the morale is: Do not blindly trust configuration settings to be
> helpful, even if their use seems to be widely spread and is looking
> reasonable at first.
>
> [1] http://oss.sgi.com/pipermail/xfs/2016-January/046308.html
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com