Hi Dave
> >> The machine's status is describe as blow:
> >>
> >> the machine has 96 physical memory. And the real use memory is about
> >> 64G, and the page cache use about 32G. we also use the swap area, at
> >> that time we have about 10G(we set the swap max size to 32G). At that
> >> moment, we find xfs report
> >>
> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> >> deadlock in kmem_alloc (mode:0x250) |
Pretty sure that's a GFP_NOFS allocation context.
You are right, it is a GFP_NOFS operator from the xfs, xfs use GFP_NOFS flag to avoid recursive filesystem call
> > Just once, or many times?
>
> the message appear many times
> from the code, I know that xfs will try 100 time of kmalloc() function
The curent upstream kernels report much more information - process,
size of allocation, etc.
In general, the cause of such problems is memory fragmentation
preventing a large contiguous allocation from taking place (e.g.
when you try to read a file with millions of extents).
> >> in the system. But there is still 32G page cache.
> >>
> >> So I run
> >>
> >> |echo 3 > /proc/sys/vm/drop_caches |
> >>
> >> to drop the page cache.
> >>
> >> Then the system is fine.
> >
> > Are you saying that the error message was repeated infinitely until you did the drop_caches?
>
>
> No. the error message don't appear after I drop_cache.
Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the /proc/buddyinfo is list blow:
Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 2983 2230 1037 290 121 63 47 61 16 0 0
Node 0, zone Normal 13707 1126 285 268 291 160 64 21 11 0 0
Node 1, zone Normal 10678 5041 1167 705 316 158 61 22 0 0 0
after the operator the /proc/buddyinfo is list blow:
Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 61091 22791 3659 348 169 81 89 63 16 0 0
Node 0, zone Normal 781723 532596 246195 57076 9853 4061 1922 799 217 19 0
Node 1, zone Normal 334903 138984 49608 6929 2770 1603 843 447 232 2 0
we can find that after the operator, we get more large size pages
beside the /proc/buddyinfo, is there any other command the get the memory fragmentation info?
And beside the drop_caches operator, is there any other command can avoid the memory fragmentation?
IIRC, the reason the system can't recover itself is that memory
compaction is not triggered from GFP_NOFS allocation context, which
means memory reclaim won't try to create contiguous regions by
moving things around and hence the allocation will not succeed until
a significant amount of memory is freed by some other trigger....
The GFP_NOFS will not triggered memory compaction, where can I find the logic in kernel source code?
thank you
--
> >> The machine's status is describe as blow:
> >>
> >> the machine has 96 physical memory. And the real use memory is about
> >> 64G, and the page cache use about 32G. we also use the swap area, at
> >> that time we have about 10G(we set the swap max size to 32G). At that
> >> moment, we find xfs report
> >>
> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> >> deadlock in kmem_alloc (mode:0x250) |
Pretty sure that's a GFP_NOFS allocation context.
You are right, it is a GFP_NOFS operator from the xfs, xfs use GFP_NOFS flag to avoid recursive filesystem call
> > Just once, or many times?
>
> the message appear many times
> from the code, I know that xfs will try 100 time of kmalloc() function
The curent upstream kernels report much more information - process,
size of allocation, etc.
In general, the cause of such problems is memory fragmentation
preventing a large contiguous allocation from taking place (e.g.
when you try to read a file with millions of extents).
> >> in the system. But there is still 32G page cache.
> >>
> >> So I run
> >>
> >> |echo 3 > /proc/sys/vm/drop_caches |
> >>
> >> to drop the page cache.
> >>
> >> Then the system is fine.
> >
> > Are you saying that the error message was repeated infinitely until you did the drop_caches?
>
>
> No. the error message don't appear after I drop_cache.
Yes, you are right, before I echo 3 > /proc/sys/vm/drop_caches, the /proc/buddyinfo is list blow:
Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 2983 2230 1037 290 121 63 47 61 16 0 0
Node 0, zone Normal 13707 1126 285 268 291 160 64 21 11 0 0
Node 1, zone Normal 10678 5041 1167 705 316 158 61 22 0 0 0
after the operator the /proc/buddyinfo is list blow:
Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 61091 22791 3659 348 169 81 89 63 16 0 0
Node 0, zone Normal 781723 532596 246195 57076 9853 4061 1922 799 217 19 0
Node 1, zone Normal 334903 138984 49608 6929 2770 1603 843 447 232 2 0
we can find that after the operator, we get more large size pages
beside the /proc/buddyinfo, is there any other command the get the memory fragmentation info?
And beside the drop_caches operator, is there any other command can avoid the memory fragmentation?
IIRC, the reason the system can't recover itself is that memory
compaction is not triggered from GFP_NOFS allocation context, which
means memory reclaim won't try to create contiguous regions by
moving things around and hence the allocation will not succeed until
a significant amount of memory is freed by some other trigger....
The GFP_NOFS will not triggered memory compaction, where can I find the logic in kernel source code?
thank you
On Wed, May 18, 2016 at 10:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Wed, May 18, 2016 at 04:58:31PM +0800, baotiao wrote:
> Thanks for your reply
>
> >> Hello every, I meet an interesting kernel memory problem. Can anyone
> >> help me explain what happen under the kernel
> >
> > Which kernel version is that?
>
> The kernel version is 3.10.0-327.4.5.el7.x86_64
RHEL7 kernel. Best you report the problem to your RH support
contact - the RHEL7 kernels are far different to upstream kernels..
> >> The machine's status is describe as blow:
> >>
> >> the machine has 96 physical memory. And the real use memory is about
> >> 64G, and the page cache use about 32G. we also use the swap area, at
> >> that time we have about 10G(we set the swap max size to 32G). At that
> >> moment, we find xfs report
> >>
> >> |Apr 29 21:54:31 w-openstack86 kernel: XFS: possible memory allocation
> >> deadlock in kmem_alloc (mode:0x250) |
Pretty sure that's a GFP_NOFS allocation context.
> > Just once, or many times?
>
> the message appear many times
> from the code, I know that xfs will try 100 time of kmalloc() function
The curent upstream kernels report much more information - process,
size of allocation, etc.
In general, the cause of such problems is memory fragmentation
preventing a large contiguous allocation from taking place (e.g.
when you try to read a file with millions of extents).
> >> in the system. But there is still 32G page cache.
> >>
> >> So I run
> >>
> >> |echo 3 > /proc/sys/vm/drop_caches |
> >>
> >> to drop the page cache.
> >>
> >> Then the system is fine.
> >
> > Are you saying that the error message was repeated infinitely until you did the drop_caches?
>
>
> No. the error message don't appear after I drop_cache.
Of course - freeing memory will cause contiguous free space to
reform. then the allocation will succeed.
IIRC, the reason the system can't recover itself is that memory
compaction is not triggered from GFP_NOFS allocation context, which
means memory reclaim won't try to create contiguous regions by
moving things around and hence the allocation will not succeed until
a significant amount of memory is freed by some other trigger....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
---
Blog: http://www.chenzongzhi.info
Twitter: https://twitter.com/baotiao
Git: https://github.com/baotiao
Blog: http://www.chenzongzhi.info
Twitter: https://twitter.com/baotiao
Git: https://github.com/baotiao