David, I retired with your fixes and my newer Ceph implementation. I still see the same issue with a page being marked as private_2 in the readahead cleanup code. I understand what happens, but not why it happens. On the plus side I haven't seen any hard crashes yet, but I'm putting it through the paces. I'm not sure if me reworking the fscache code in Ceph or your wait_on_atomic fix but I'm fine sharing the blame / success here. [48532035.686695] BUG: Bad page state in process petabucket pfn:3b5ffb [48532035.686715] page:ffffea000ed7fec0 count:0 mapcount:0 mapping: (null) index:0x2c [48532035.686720] page flags: 0x200000000001000(private_2) [48532035.686724] Modules linked in: ceph libceph cachefiles auth_rpcgss oid_registry nfsv4 microcode nfs fscache lockd sunrpc raid10 raid456 async_pq async_xor async_memcpy async_raid6_recov async_tx raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor zlib_deflate libcrc32c [48532035.686735] CPU: 1 PID: 32420 Comm: petabucket Tainted: G B 3.10.0-virtual #45 [48532035.686736] 0000000000000001 ffff88042bf57a48 ffffffff815523f2 ffff88042bf57a68 [48532035.686738] ffffffff8111def7 ffff880400000001 ffffea000ed7fec0 ffff88042bf57aa8 [48532035.686740] ffffffff8111e49e 0000000000000000 ffffea000ed7fec0 0200000000001000 [48532035.686742] Call Trace: [48532035.686745] [<ffffffff815523f2>] dump_stack+0x19/0x1b [48532035.686747] [<ffffffff8111def7>] bad_page+0xc7/0x120 [48532035.686749] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120 [48532035.686751] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170 [48532035.686753] [<ffffffff81123507>] __put_single_page+0x27/0x30 [48532035.686755] [<ffffffff81123df5>] put_page+0x25/0x40 [48532035.686757] [<ffffffff81123e66>] put_pages_list+0x56/0x70 [48532035.686759] [<ffffffff81122a98>] __do_page_cache_readahead+0x1b8/0x260 [48532035.686762] [<ffffffff81122ea1>] ra_submit+0x21/0x30 [48532035.686835] [<ffffffff81118f64>] filemap_fault+0x254/0x490 [48532035.686838] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0 [48532035.686840] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110 [48532035.686842] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930 [48532035.686845] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110 [48532035.686847] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10 [48532035.686849] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [48532035.686851] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370 [48532035.686853] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30 [48532035.686870] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550 [48532035.686872] [<ffffffff81003e03>] ? xen_write_msr_safe+0xa3/0xc0 [48532035.686874] [<ffffffff81004ec2>] ? xen_mc_flush+0xb2/0x1c0 [48532035.686876] [<ffffffff8100483d>] ? xen_clts+0x8d/0x190 [48532035.686878] [<ffffffff81556ad6>] ? __schedule+0x3a6/0x820 [48532035.686880] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10 [48532035.686882] [<ffffffff81558818>] page_fault+0x28/0x30 - Milosz On Thu, Jul 25, 2013 at 11:20 AM, David Howells <dhowells@xxxxxxxxxx> wrote: > Milosz Tanski <milosz@xxxxxxxxx> wrote: > >> In my case I'm seeing this in cases when all user space have these >> opened R/O. Like I wrote this out weeks ago, rebooted... so nobody is >> using R/W. > > I gave Linus a patch to fix wait_on_atomic_t() which he has committed. Can > you see if that fixed the problem? I'm not sure it will, but it's worth > checking. > > David -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs