I forgot to mention this.... David Howells attempt to fix a similar issue with NFS and fscache on ext4 last year: http://www.redhat.com/archives/linux-cachefs/2013-May/msg00003.html The problem is that ext4 it's wisdom tries to allocate a page without using GPF_NOFS In the code: ext4/inode.c:2678 so the fix that David added not going to do anything for us. /* * grab_cache_page_write_begin() can take a long time if the * system is thrashing due to memory pressure, or if the page * is being written back. So grab it first before we start * the transaction handle. This also allows us to allocate * the page (if needed) without using GFP_NOFS. */ retry_grab: page = grab_cache_page_write_begin(mapping, index, flags); if (!page) On Sat, Jul 19, 2014 at 4:20 PM, Milosz Tanski <milosz@xxxxxxxxx> wrote: > Neil, > > I saw your recent patcheset for improving the wait_on_bit interface > (particular: SCHED: allow wait_on_bit_action functions to support a > timeout.) I'm looking on some guidance on leveraging that work to > solve other recursive lock hang in fscache. > > I've ran into similar issues you're trying to solve with loopback NFS > but in the fscache code. This happens under heavy vma preasure when > the kernel is aggressively trying to trim the page cache. > > The hang is caused by this serious of events > 1. cachefiles_write_page - cachefiles (the fscache backend, sitting on > ext4) tries to write page to disk > 2. ext4 tries to allocate a page in writeback (without GPF_NOFS and > with wait flag) > 3. due to vma preasure the kernel tries to free-up pages > 4. this causes release pages in ceph to be called > 5. the selected page is cached page in process of write out (from step #1) > 6. fscache_wait_on_page_write hangs forever > > Is there a solution that you have to NFS as another patch that > implements the timeout that I can use a template? I'm not familiar > with that piece of the code base. > > Best, > - Milosz > > INFO: task kworker/u30:7:28375 blocked for more than 120 seconds. > Not tainted 3.15.0-virtual #74 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/u30:7 D 0000000000000000 0 28375 2 0x00000000 > Workqueue: fscache_operation fscache_op_work_func [fscache] > ffff88000b147148 0000000000000046 0000000000000000 ffff88000b1471c8 > ffff8807aa031820 0000000000014040 ffff88000b147fd8 0000000000014040 > ffff880f0c50c860 ffff8807aa031820 ffff88000b147158 ffff88007be59cd0 > Call Trace: > [<ffffffff815930e9>] schedule+0x29/0x70 > [<ffffffffa018bed5>] __fscache_wait_on_page_write+0x55/0x90 [fscache] > [<ffffffff810a4350>] ? __wake_up_sync+0x20/0x20 > [<ffffffffa018c135>] __fscache_maybe_release_page+0x65/0x1e0 [fscache] > [<ffffffffa02ad813>] ceph_releasepage+0x83/0x100 [ceph] > [<ffffffff811635b0>] ? anon_vma_fork+0x130/0x130 > [<ffffffff8112cdd2>] try_to_release_page+0x32/0x50 > [<ffffffff81140096>] shrink_page_list+0x7e6/0x9d0 > [<ffffffff8113f278>] ? isolate_lru_pages.isra.73+0x78/0x1e0 > [<ffffffff81140932>] shrink_inactive_list+0x252/0x4c0 > [<ffffffff811412b1>] shrink_lruvec+0x3e1/0x670 > [<ffffffff8114157f>] shrink_zone+0x3f/0x110 > [<ffffffff81141b06>] do_try_to_free_pages+0x1d6/0x450 > [<ffffffff8114a939>] ? zone_statistics+0x99/0xc0 > [<ffffffff81141e44>] try_to_free_pages+0xc4/0x180 > [<ffffffff81136982>] __alloc_pages_nodemask+0x6b2/0xa60 > [<ffffffff811c1d4e>] ? __find_get_block+0xbe/0x250 > [<ffffffff810a405e>] ? wake_up_bit+0x2e/0x40 > [<ffffffff811740c3>] alloc_pages_current+0xb3/0x180 > [<ffffffff8112cf07>] __page_cache_alloc+0xb7/0xd0 > [<ffffffff8112da6c>] grab_cache_page_write_begin+0x7c/0xe0 > [<ffffffff81214072>] ? ext4_mark_inode_dirty+0x82/0x220 > [<ffffffff81214a89>] ext4_da_write_begin+0x89/0x2d0 > [<ffffffff8112c6ee>] generic_perform_write+0xbe/0x1d0 > [<ffffffff811a96b1>] ? update_time+0x81/0xc0 > [<ffffffff811ad4c2>] ? mnt_clone_write+0x12/0x30 > [<ffffffff8112e80e>] __generic_file_aio_write+0x1ce/0x3f0 > [<ffffffff8112ea8e>] generic_file_aio_write+0x5e/0xe0 > [<ffffffff8120b94f>] ext4_file_write+0x9f/0x410 > [<ffffffff8120af56>] ? ext4_file_open+0x66/0x180 > [<ffffffff8118f0da>] do_sync_write+0x5a/0x90 > [<ffffffffa025c6c9>] cachefiles_write_page+0x149/0x430 [cachefiles] > [<ffffffff812cf439>] ? radix_tree_gang_lookup_tag+0x89/0xd0 > [<ffffffffa018c512>] fscache_write_op+0x222/0x3b0 [fscache] > [<ffffffffa018b35a>] fscache_op_work_func+0x3a/0x100 [fscache] > [<ffffffff8107bfe9>] process_one_work+0x179/0x4a0 > [<ffffffff8107d47b>] worker_thread+0x11b/0x370 > [<ffffffff8107d360>] ? manage_workers.isra.21+0x2e0/0x2e0 > [<ffffffff81083d69>] kthread+0xc9/0xe0 > [<ffffffff81010000>] ? ftrace_raw_event_xen_mmu_release_ptpage+0x70/0x90 > [<ffffffff81083ca0>] ? flush_kthread_worker+0xb0/0xb0 > [<ffffffff8159eefc>] ret_from_fork+0x7c/0xb0 > [<ffffffff81083ca0>] ? flush_kthread_worker+0xb0/0xb0 > > -- > Milosz Tanski > CTO > 16 East 34th Street, 15th floor > New York, NY 10016 > > p: 646-253-9055 > e: milosz@xxxxxxxxx -- Milosz Tanski CTO 16 East 34th Street, 15th floor New York, NY 10016 p: 646-253-9055 e: milosz@xxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html