After taking a look in the code I'm guessing that this is caused by the cachefiles module in cachefiles_allocate_pages(). It's the only places where the pages get marked as with private_2 in that path. My guess is that we mark all the pages in the list with private_2 but we don't consume the whole and when readahead does the page list cleanup it finds these. Any insight if I'm on the right path? - Milosz On Thu, Aug 8, 2013 at 12:44 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote: > David, > > I retired with your fixes and my newer Ceph implementation. I still > see the same issue with a page being marked as private_2 in the > readahead cleanup code. I understand what happens, but not why it > happens. > > On the plus side I haven't seen any hard crashes yet, but I'm putting > it through the paces. I'm not sure if me reworking the fscache code in > Ceph or your wait_on_atomic fix but I'm fine sharing the blame / > success here. > > [48532035.686695] BUG: Bad page state in process petabucket pfn:3b5ffb > [48532035.686715] page:ffffea000ed7fec0 count:0 mapcount:0 mapping: > (null) index:0x2c > [48532035.686720] page flags: 0x200000000001000(private_2) > [48532035.686724] Modules linked in: ceph libceph cachefiles > auth_rpcgss oid_registry nfsv4 microcode nfs fscache lockd sunrpc > raid10 raid456 async_pq async_xor async_memcpy async_raid6_recov > async_tx raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor > zlib_deflate libcrc32c > [48532035.686735] CPU: 1 PID: 32420 Comm: petabucket Tainted: G B > 3.10.0-virtual #45 > [48532035.686736] 0000000000000001 ffff88042bf57a48 ffffffff815523f2 > ffff88042bf57a68 > [48532035.686738] ffffffff8111def7 ffff880400000001 ffffea000ed7fec0 > ffff88042bf57aa8 > [48532035.686740] ffffffff8111e49e 0000000000000000 ffffea000ed7fec0 > 0200000000001000 > [48532035.686742] Call Trace: > [48532035.686745] [<ffffffff815523f2>] dump_stack+0x19/0x1b > [48532035.686747] [<ffffffff8111def7>] bad_page+0xc7/0x120 > [48532035.686749] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120 > [48532035.686751] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170 > [48532035.686753] [<ffffffff81123507>] __put_single_page+0x27/0x30 > [48532035.686755] [<ffffffff81123df5>] put_page+0x25/0x40 > [48532035.686757] [<ffffffff81123e66>] put_pages_list+0x56/0x70 > [48532035.686759] [<ffffffff81122a98>] __do_page_cache_readahead+0x1b8/0x260 > [48532035.686762] [<ffffffff81122ea1>] ra_submit+0x21/0x30 > [48532035.686835] [<ffffffff81118f64>] filemap_fault+0x254/0x490 > [48532035.686838] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0 > [48532035.686840] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110 > [48532035.686842] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930 > [48532035.686845] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110 > [48532035.686847] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10 > [48532035.686849] [<ffffffff81005469>] ? > __raw_callee_save_xen_pmd_val+0x11/0x1e > [48532035.686851] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370 > [48532035.686853] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30 > [48532035.686870] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550 > [48532035.686872] [<ffffffff81003e03>] ? xen_write_msr_safe+0xa3/0xc0 > [48532035.686874] [<ffffffff81004ec2>] ? xen_mc_flush+0xb2/0x1c0 > [48532035.686876] [<ffffffff8100483d>] ? xen_clts+0x8d/0x190 > [48532035.686878] [<ffffffff81556ad6>] ? __schedule+0x3a6/0x820 > [48532035.686880] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10 > [48532035.686882] [<ffffffff81558818>] page_fault+0x28/0x30 > > - Milosz > > On Thu, Jul 25, 2013 at 11:20 AM, David Howells <dhowells@xxxxxxxxxx> wrote: >> Milosz Tanski <milosz@xxxxxxxxx> wrote: >> >>> In my case I'm seeing this in cases when all user space have these >>> opened R/O. Like I wrote this out weeks ago, rebooted... so nobody is >>> using R/W. >> >> I gave Linus a patch to fix wait_on_atomic_t() which he has committed. Can >> you see if that fixed the problem? I'm not sure it will, but it's worth >> checking. >> >> David -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs