(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 7 Mar 2011 19:12:23 GMT bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=30702 > > Summary: vmalloc(GFP_NOFS) can callback file system > evict_inode, inducing deadlock. Yeah. Ricardo has been working on this. See the thread at http://marc.info/?l=linux-mm&m=128942194520631&w=4 It's tough, and we've been bad, and progress is slow :( > Product: Memory Management > Version: 2.5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > AssignedTo: akpm@xxxxxxxxxxxxxxxxxxxx > ReportedBy: prasadjoshi124@xxxxxxxxx > Regression: No > > > I am working on a propitiatory file system development. The problem I am facing > is with calling __vmalloc in a lock. Though I am working on changing the code > that I have, I thought it would be good to atleast report the VMALLOC problem. > > The code looks something like this > > const struct file_operations lzfs_file_operations = { > .write = lzfs_vnop_write, > }; > > ssize_t > lzfs_vnop_write() > { > mutex_lock(some global mutex); > ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL); > mutex_unlock(some global mutex); > } > > static const struct super_operations lzfs_super_ops = { > .evict_inode = lzfs_evict_vnode, > }; > > static void > lzfs_evict_vnode(struct inode *inode) > { > mutex_lock(some global mutex); > > some code for eviction; > > mutex_unlock(some global mutex); > } > > As the __vmalloc is called with GFP_NOFS, I was expecting the evict_inode (or > clear_inode) would not be called when page cache is purned. But I noticed > following oops message during the testing. > > [ 5058.193312] [<ffffffffa092a534>] lzfs_clear_vnode+0x104/0x160 [lzfs] > [ 5058.193318] [<ffffffff8116abc5>] clear_inode+0x75/0xf0 > [ 5058.193323] [<ffffffff8116ac80>] dispose_list+0x40/0x150 > [ 5058.193328] [<ffffffff8116af23>] prune_icache+0x193/0x2a0 > [ 5058.193332] [<ffffffff811665e3>] ? prune_dcache+0x183/0x1d0 > [ 5058.193338] [<ffffffff8116b081>] shrink_icache_memory+0x51/0x60 > [ 5058.193345] [<ffffffff8110e6d4>] shrink_slab+0x124/0x180 > [ 5058.193349] [<ffffffff8110ff0f>] do_try_to_free_pages+0x1cf/0x360 > [ 5058.193354] [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70 > [ 5058.193359] [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590 > [ 5058.193365] [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0 > [ 5058.193370] [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100 > [ 5058.193375] [<ffffffff811066ce>] __get_free_pages+0xe/0x50 > [ 5058.193380] [<ffffffff81042435>] pte_alloc_one_kernel+0x15/0x20 > [ 5058.193385] [<ffffffff8111c86b>] __pte_alloc_kernel+0x1b/0xc0 > [ 5058.193391] [<ffffffff8112ad63>] vmap_pte_range+0x183/0x1a0 > [ 5058.193395] [<ffffffff8112aec6>] vmap_pud_range+0x146/0x1c0 > [ 5058.193400] [<ffffffff8112afda>] vmap_page_range_noflush+0x9a/0xc0 > [ 5058.193405] [<ffffffff8112b032>] map_vm_area+0x32/0x50 > [ 5058.193410] [<ffffffff8112c4a8>] __vmalloc_area_node+0x108/0x190 > [ 5058.193426] [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl] > [ 5058.193431] [<ffffffff8112c392>] __vmalloc_node+0xa2/0xb0 > [ 5058.193443] [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl] > [ 5058.193453] [<ffffffff8112c712>] __vmalloc+0x22/0x30 > [ 5058.193464] [<ffffffffa06591a0>] kv_alloc+0x90/0x130 [spl] > [ 5058.194007] [<ffffffffa0858136>] zfs_grow_blocksize+0x46/0xe0 [zfs] > [ 5058.194063] [<ffffffffa08547e8>] zfs_write+0xbb8/0x1100 [zfs] > [ 5058.194075] [<ffffffff8114e740>] ? mem_cgroup_charge_common+0x70/0x90 > [ 5058.194082] [<ffffffffa092ced7>] lzfs_vnop_write+0xc7/0x3b0 [lzfs] > [ 5058.194087] [<ffffffff8111bacc>] ? do_anonymous_page+0x11c/0x350 > [ 5058.194096] [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0 > [ 5058.194100] [<ffffffff81153711>] sys_write+0x51/0x80 > [ 5058.194105] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b > > The problem is with __vmalloc (map_vm_area) which discards the allocation flag > while mapping the scattered physical pages contiguously into the virtual > vmalloc area. > > 1482 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > 1483 pgprot_t prot, int node, void *caller) > 1484 { > 1525 if (map_vm_area(area, prot, &pages, gfp_mask)) > 1526 goto fail; > 1527 return area->addr; > 1532 } > > The function map_vm_area() can result in calls to > pud_alloc > pmd_alloc > pte_alloc_kernel > > Which allocate memory using flag GFP_KERNEL > for example > pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) > { > pte_t *pte; > > pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO); > return pte; > } > > The page allocation might trigger the clear_inode (or evict_inode) if the > system is running short of the memory. Thus causing the oops. > > Though non of the file system in Linux Kernel seems to calling vmalloc in a > lock, it would be good to fix the problem anyway. > > As far as I can understand the solution is to pass the gfp_mask down the call > hierarchy. I wanted to send the patch with these changes, but soon I realized > changes are needed at various places and are too much. I thought to reporting > the problem first. > > Thanks and Regards. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>