Re: isolate_freepages_block and excessive CPU usage by OSD process

Andrey Korolyov <andrey@xxxxxxx> · Wed, 19 Nov 2014 22:03:44 +0400

On Wed, Nov 19, 2014 at 4:21 AM, Christian Marie <christian@xxxxxxxxx> wrote:
>> Hello,
>>
>> I had found recently that the OSD daemons under certain conditions
>> (moderate vm pressure, moderate I/O, slightly altered vm settings) can
>> go into loop involving isolate_freepages and effectively hit Ceph
>> cluster performance.
>
> Hi! I'm the creator of the server fault issue you reference:
>
> http://serverfault.com/questions/642883/cause-of-page-fragmentation-on-large-server-with-xfs-20-disks-and-ceph
>
> I'd like to get to the bottom of this very much, I'm seeing a very similar
> pattern on 3.10.0-123.9.3.el7.x86_64, if this is fixed in later versions
> perhaps we could backport something.
>
> Here is some perf output:
>
> http://ponies.io/raw/compaction.png
>
> Looks pretty similar. I also have hundreds of MB logs and traces should we need
> some specific question answered.
>
> I've managed to reproduce many failed compactions with this:
>
> https://gist.github.com/christian-marie/cde7e80c5edb889da541
>
> I took some compaction stress test code and bolted on a little loop to mmap a
> large sparse file and read every PAGE_SIZEth byte.
>
> Run it once, compactions seem to do okay, run it again and they're really slow.
> This seems to be because my little trick to fill up cache memory only seems to
> work exactly half the time. Note that transhuge pages are only used to
> introduce fragmentation/pressure here, turning transparent huge pages off
> doesn't seem to make the slightest difference to the spinning-in-reclaim issue.
>
> We are using Mellanox ipoib drivers which do not do scatter-gather, so I'm
> currently working on adding support for that (the hardware supports it). Are
> you also using ipoib or have something else doing high order allocations? It's
> a bit concerning for me if you don't as it would suggest that cutting down on
> those allocations won't help.

So do I. On a test environment with regular tengig cards I was unable
to reproduce the issue. Honestly, I thought that almost every
contemporary driver for high-speed cards is working with
scatter-gather, so I had not mlx in mind as a potential cause of this
problem from very beginning. There are a couple of reports in ceph
lists, complaining for OSD flapping/unresponsiveness without clear
reason on certain (not always clear though) conditions which may have
same root cause. Wonder if numad-like mechanism will help there, but
its usage is generally an anti-performance pattern in my experience.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>