Re: why the kmalloc return fail when there is free physical address but return success after dropping page caches

陈宗志 <baotiao@xxxxxxxxx> · Wed, 25 May 2016 17:25:05 +0800

Hi Dave

> >> The machine's status is describe as blow:

> >>

> >> the machine has 96 physical memory. And the real use memory is about

> >> 64G, and the page cache use about 32G. we also use the swap area, at

> >> that time we have about 10G(we set the swap max size to 32G). At that

> >> moment, we find xfs report

> >>

> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation

> >> deadlock in kmem_alloc (mode:0x250) |

Pretty sure that's a GFP_NOFS allocation context.

You are right, it is a GFP_NOFS operator from the xfs,  xfs use GFP_NOFS flag to avoid recursive filesystem call

> > Just once, or many times?

>

> the message appear many times

> from the code, I know that xfs will try 100 time of kmalloc() function

The curent upstream kernels report much more information - process,

size of allocation, etc.

In general, the cause of such problems is memory fragmentation

preventing a large contiguous allocation from taking place (e.g.

when you try to read a file with millions of extents).

> >> in the system. But there is still 32G page cache.

> >>

> >> So I run

> >>

> >> |echo 3 > /proc/sys/vm/drop_caches |

> >>

> >> to drop the page cache.

> >>

> >> Then the system is fine.

> >

> > Are you saying that the error message was repeated infinitely until you did the drop_caches?

>

>

> No. the error message don't appear after I drop_cache.

Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the /proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3
Node 0, zone    DMA32   2983   2230   1037    290    121     63     47     61     16      0      0
Node 0, zone   Normal  13707   1126    285    268    291    160     64     21     11      0      0
Node 1, zone   Normal  10678   5041   1167    705    316    158     61     22      0      0      0

after the operator the /proc/buddyinfo is list blow:
Node 0, zone      DMA      0      0      0      1      2      1      1      0      1      1      3
Node 0, zone    DMA32  61091  22791   3659    348    169     81     89     63     16      0      0
Node 0, zone   Normal 781723 532596 246195  57076   9853   4061   1922    799    217     19      0
Node 1, zone   Normal 334903 138984  49608   6929   2770   1603    843    447    232      2      0

we can find that after the operator, we get more large size pages

beside the /proc/buddyinfo, is there any other command the get the memory fragmentation info?

And beside the drop_caches operator, is there any other command can avoid the memory fragmentation?

IIRC, the reason the system can't recover itself is that memory

compaction is not triggered from GFP_NOFS allocation context, which

means memory reclaim won't try to create contiguous regions by

moving things around and hence the allocation will not succeed until

a significant amount of memory is freed by some other trigger....

The GFP_NOFS will not triggered memory compaction, where can I find the logic in kernel source code?

thank you

On Wed, May 18, 2016 at 10:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Wed, May 18, 2016 at 04:58:31PM +0800, baotiao wrote:

> Thanks for your reply

>

> >> Hello every, I meet an interesting kernel memory problem. Can anyone

> >> help me explain what happen under the kernel

> >

> > Which kernel version is that?

>

> The kernel version is 3.10.0-327.4.5.el7.x86_64

RHEL7 kernel. Best you report the problem to your RH support

contact - the RHEL7 kernels are far different to upstream kernels..

> >> The machine's status is describe as blow:

> >>

> >> the machine has 96 physical memory. And the real use memory is about

> >> 64G, and the page cache use about 32G. we also use the swap area, at

> >> that time we have about 10G(we set the swap max size to 32G). At that

> >> moment, we find xfs report

> >>

> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation

> >> deadlock in kmem_alloc (mode:0x250) |