Theodore Tso wrote: > On Fri, Jun 19, 2009 at 08:46:12PM -0500, Eric Sandeen wrote: >> if you got within 372 bytes on 32-bit (with 8k stacks) then that's >> indeed pretty worrisome. > > Fortunately this was with a 4k stack, but it's still not a good thing; > the 8k stack also has to support interrupts / soft irq's, whereas the > CONFIG_4KSTACK has a separate interrupt stack.... Hm I thought we had irq stacks for both but I guess not. FWIW F11 has gone to 8k stacks ( \o/ ) > Anyone have statistics on what the worst, most evil proprietary > SCSI/FC/10gigE driver might use in terms of stack space, combined with > say, the most evil proprietary multipath product at interrupt/softirq > time, by any chance? well I'm inclined to ignore the proprietary stuff TBH, we can't control it. > In any case, here are two stack dumps that I captured, the first using > a 1k blocksize, and the second using a 4k blocksize (not that the > blocksize should make a huge amount of difference). This time, I got > to within 200 bytes of disaster on the second stack dump. Worse yet, > the stack usage bloat isn't in any one place, it seems to be finally > peanut-buttered across the call stack. > > I can see some things we can do to optimize stack usage; for example, > struct ext4_allocation_request is allocated on the stack, and the > structure was laid out without any regard to space wastage caused by > alignment requirements. That won't help on x86 at all, but it will > help substantially on x86_64 (since x86_64 requires that 8 byte > variables must be 8-byte aligned, where as x86_64 only requires 4 byte > alignment, even for unsigned long long's). But it's going have to be > a whole series of incremental improvements; I don't see any magic > bullet solution to our stack usage. XFS forces gcc to not inline any static function; it's extreme, but maybe it'd help here too. > - Ted > > > Depth Size Location (38 entries) > ----- ---- -------- > 0) 3064 48 kvm_mmu_write+0x5f/0x67 > 1) 3016 16 kvm_set_pte+0x21/0x27 > 2) 3000 208 __change_page_attr_set_clr+0x272/0x73b This looks like a victim of inlining > 3) 2792 76 kernel_map_pages+0xd4/0x102 > 4) 2716 80 get_page_from_freelist+0x2dd/0x3b5 > 5) 2636 108 __alloc_pages_nodemask+0xf6/0x435 > 6) 2528 16 alloc_slab_page+0x20/0x26 > 7) 2512 60 __slab_alloc+0x171/0x470 > 8) 2452 4 kmem_cache_alloc+0x8f/0x127 > 9) 2448 68 radix_tree_preload+0x27/0x66 > 10) 2380 56 cfq_set_request+0xf1/0x2b4 > 11) 2324 16 elv_set_request+0x1c/0x2b > 12) 2308 44 get_request+0x1b0/0x25f > 13) 2264 60 get_request_wait+0x1d/0x135 > 14) 2204 52 __make_request+0x24d/0x34e > 15) 2152 96 generic_make_request+0x28f/0x2d2 > 16) 2056 56 submit_bio+0xb2/0xba > 17) 2000 20 submit_bh+0xe4/0x101 > 18) 1980 196 ext4_mb_init_cache+0x221/0x8ad nothing obvious here, maybe inlining > 19) 1784 232 ext4_mb_regular_allocator+0x443/0xbda ditto > 20) 1552 72 ext4_mb_new_blocks+0x1f6/0x46d > 21) 1480 220 ext4_ext_get_blocks+0xad9/0xc68 ext4_allocation_request is largeish & holey as you said: struct ext4_allocation_request { struct inode * inode; /* 0 8 */ ext4_lblk_t logical; /* 8 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t goal; /* 16 8 */ ext4_lblk_t lleft; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pleft; /* 32 8 */ ext4_lblk_t lright; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pright; /* 48 8 */ unsigned int len; /* 56 4 */ unsigned int flags; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 9 */ /* sum members: 52, holes: 3, sum holes: 12 */ }; > 22) 1260 68 ext4_get_blocks+0x10e/0x27e > 23) 1192 244 mpage_da_map_blocks+0xa7/0x720 struct buffer_head new on the stack hurts here, it's 104 bytes. I really dislike the whole abuse of buffer heads as handy containers for mapping, we don't need all these fields, I think, but that's a battle for another day. > 24) 948 108 ext4_da_writepages+0x27b/0x3d3 > 25) 840 16 do_writepages+0x28/0x39 > 26) 824 72 __writeback_single_inode+0x162/0x333 > 27) 752 68 generic_sync_sb_inodes+0x2b6/0x426 > 28) 684 20 writeback_inodes+0x8a/0xd1 > 29) 664 96 balance_dirty_pages_ratelimited_nr+0x12d/0x237 > 30) 568 92 generic_file_buffered_write+0x173/0x23e > 31) 476 124 __generic_file_aio_write_nolock+0x258/0x280 > 32) 352 52 generic_file_aio_write+0x6e/0xc2 > 33) 300 52 ext4_file_write+0xa8/0x12c > 34) 248 36 aio_rw_vect_retry+0x72/0x135 > 35) 212 24 aio_run_iocb+0x69/0xfd > 36) 188 108 sys_io_submit+0x418/0x4dc > 37) 80 80 syscall_call+0x7/0xb <snip> -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html