Re: isolate_freepages_block and excessive CPU usage by OSD process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/02/2014 05:53 AM, Joonsoo Kim wrote:
On Tue, Dec 02, 2014 at 12:47:24PM +1100, Christian Marie wrote:
On 28.11.2014 9:03, Joonsoo Kim wrote:
Hello,

I didn't follow-up this discussion, but, at glance, this excessive CPU
usage by compaction is related to following fixes.

Could you test following two patches?

If these fixes your problem, I will resumit patches with proper commit
description.

-------- 8< ---------


Thanks for looking into this. Running 3.18-rc5 kernel with your patches has
produced some interesting results.

Load average still spikes to around 2000-3000 with the processors spinning 100%
doing compaction related things when min_free_kbytes is left at the default.

However, unlike before, the system is now completely stable. Pre-patch it would
be almost completely unresponsive (having to wait 30 seconds to establish an
SSH connection and several seconds to send a character).

Is it reasonable to guess that ipoib is giving compaction a hard time and
fixing this bug has allowed the system to at least not lock up?

I will try back-porting this to 3.10 and seeing if it is stable under these
strange conditions also.

Hello,

Good to hear!

Indeed, although I somehow doubt your first patch could have made such difference. It only matters when you have a whole pageblock free. Without the patch, the particular compaction attempt that managed to free the block might not be terminated ASAP, but then the free pageblock is still allocatable by the following allocation attempts, so it shouldn't result in a stream of complete compactions.

So I would expect it's either a fluke, or the second patch made the difference, to either SLUB or something else making such fallback-able allocations.

But hmm, I've never considered the implications of compact_finished() migratetypes handling on unmovable allocations. Regardless of cc->order, it often has to free a whole pageblock to succeed, as it's unlikely it will succeed compacting within a pageblock already marked as UNMOVABLE. Guess it's to prevent further fragmentation and that makes sense, but it does make high-order unmovable allocations problematic. At least the watermark checks for allowing compaction in the first place are then wrong - we decide that based on cc->order, but in we fact need at least a pageblock worth of space free to actually succeed.

Load average spike may be related to skip bit management. Currently, there is
no way to maintain skip bit permanently. So, after one iteration of compaction
is finished and skip bit is reset, all pageblocks should be re-scanned.

Shouldn't be "after one iteration of compaction", the bits are cleared only when compaction is restarting after being deferred, or when kswapd goes to sleep.

Your system has mellanox driver and although I don't know exactly what it is,
I heard that it allocates enormous pages and do get_user_pages() to
pin pages in memory. These memory aren't available to compaction, but,
compaction always scan it.

This is just my assumption, so if possible, please check it with
compaction tracepoint. If it is, we can make a solution for this
problem.

Anyway, could you test one more time without second patch?
IMO, first patch is reasonable to backport, because it fixes a real bug.
But, I'm not sure if second patch is needed to backport or not.
One more testing will help us to understand the effect of patch.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]