Re: BUG: Bad page state in process with linux 3.4.77

Khalid Aziz <khalid.aziz@xxxxxxxxxx> · Thu, 23 Jan 2014 12:58:31 -0700

On 01/23/2014 12:38 PM, Greg KH wrote:
On Thu, Jan 23, 2014 at 07:55:14PM +0100, Guillaume Morin wrote:
I originally reported this on 3.4.76 but I see the same issue on 3.4.77.
I've also tested with a 3.13 rc and could not reproduce the issue.  All
of these tests are done on a 12-core amd64 machines.

I wrote this simple program (attached) to play around with kernel AIO.
It simply does kernel AIO with O_DIRECT on a small temp file stored on
an ext4 filesystem.

When I run it with "LD_PRELOAD=libhugetlbfs.so", it triggers a "Bad page
state" BUG on exit every time.

Removing LD_PRELOAD from the command line fixes the problem.  Note that
my kernel does not use THP, it is NOT compiled with
CONFIG_TRANSPARENT_HUGEPAGE.

kernel: BUG: Bad page state in process aio_test  pfn:1b7201
kernel: page:ffffea0006dc8040 count:0
mapcount:1 mapping:          (null) index:0x91e
kernel: page flags: 0x20000000008000(tail)
kernel: Modules linked in: nfsd exportfs nfs nfs_acl auth_rpcgss fscache lockd sunrpc rdma_ucm rdma_cm ib_addr iw_cm ib_uverbs ib_cm ib_sa ib_mad ib_core ipmi_si ipmi_devintf ioatdma coretemp microcode i2c_i801 serio_raw pcspkr i2c_core dca dm_mod sg sr_mod cdrom crc32c_intel ahci libahci [last unloaded: scsi_wait_scan]
kernel: Pid: 5170, comm: aio_test Tainted: G           O 3.4.77bug #1
kernel: Call Trace:
kernel: [<ffffffff810f3300>] ?  is_free_buddy_page+0xa0/0xd0
kernel: [<ffffffff814c0861>] bad_page+0xe6/0xfc
kernel: [<ffffffff810f3dbc>] free_pages_prepare+0xfc/0x110
kernel: [<ffffffff811afe20>] ?  noalloc_get_block_write+0x30/0x30
kernel: [<ffffffff810f3dff>] __free_pages_ok+0x2f/0xd0
kernel: [<ffffffff810f4080>] __free_pages+0x20/0x40
kernel: [<ffffffff81124737>] update_and_free_page+0x77/0x80
kernel: [<ffffffff8112633e>] free_huge_page+0x16e/0x180
kernel: [<ffffffff810f8030>] __put_compound_page+0x20/0x50
kernel: [<ffffffff810f8108>] put_compound_page+0x78/0x140
kernel: [<ffffffff810f8546>] put_page+0x36/0x40
kernel: [<ffffffff81126ede>] __unmap_hugepage_range+0x1ce/0x230
kernel: [<ffffffff81127331>] unmap_hugepage_range+0x51/0x90
kernel: [<ffffffff8110e880>] unmap_single_vma+0x730/0x740
kernel: [<ffffffff8110f05f>] unmap_vmas+0x5f/0x80
kernel: [<ffffffff8111672c>] exit_mmap+0xbc/0x130
kernel: [<ffffffff8112e223>] ?  kmem_cache_free+0xd3/0xe0
kernel: [<ffffffff81035155>] mmput+0x35/0xf0
kernel: [<ffffffff8103a58d>] exit_mm+0xfd/0x120
kernel: [<ffffffff8103bb6c>] do_exit+0x16c/0x8b0
kernel: [<ffffffff811540c4>] ?  mntput+0x24/0x40
kernel: [<ffffffff81138962>] ?  fput+0x192/0x250
kernel: [<ffffffff8103c5ff>] do_group_exit+0x3f/0xa0
kernel: [<ffffffff8103c677>] sys_exit_group+0x17/0x20
kernel: [<ffffffff814d0492>] system_call_fastpath+0x16/0x1b

When I revert the following patch, I cannot reproduce the problem
commit b07ef016454ff46f98e633b5a6247ca7e343fb67
Author: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
Date:   Wed Sep 11 14:22:20 2013 -0700

This patch was added to the 3.4 branch for 3.4.69.

Please at least email the people involved in the patch, as the only real
"subscribers" to the stable mailing list are the people who make the
stable releases, not the developers who write the patches that are
backported.

Khalid, any thoughts about this?  You tested this out on 3.4, can you
duplicate this with the userspace program below?

Hello Guillaume,

This does look like the same issue that is fixed with commit 
27c73ae759774e63313c1fbfeb17ba076cea64c5. I will build a 3.4.69 kernel 
and try to reproduce this problem.

--
Khalid

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html