Soft-lockup on vfsmount_lock with large numbers of mount namespaces in the cloud

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



An openstack neutron gateway uses network namespaces to partition
machines within a cloud. In order to do so it creates lots of network
namespaces, and as a result mount namespaces. This is accomplished
through many calls to

$ ip netns add/delete/exec

After roughly 3k-4k namespaces the performance of these ip calls becomes
very slow on the order of many seconds.  After a few more the machine
starts to report "BUGs" on the stuck ip processes (BUG output below).

We think the problem is contention for the vfsmount_lock which gets held
by do_umount while it walks the mounts in the following stack

do_umount
 -> umount_tree
    -> propagate_umount
       -> __propagate_umount
          -> __lookup_mnt

Where lookup_mnt proceeds to spend significant time walking the
mount_hastable.

How we can mitigate or fix this expensive operation while holding the
lock?  If this has already been fixed please feel free to point me at
requisite git hash's.

Perhaps I'm looking in the wrong area of code, and I really just need
aa7a574d0c54cc5a0aceb7357b5097342c0844ee.  Are there any others that
immediately stand out or is this a new problem?

Also we've tried reproducing with 3.5, 3.8, 3.11 which yielded similar
results. 3.13 runs into similar results but has different issues related
to the RCU locking.  When I have a better idea as to what's going on
with 3.13 I will report back about that.

Thanks,
Dave Chiluk.


[15645.196718] BUG: soft lockup - CPU#23 stuck for 22s! [ip:5898]
[15645.203279] Modules linked in: xt_conntrack nfnetlink xt_CT
iptable_raw ipt_REDIRECT veth ipmi_devintf ipmi_si ipmi_msghandler
iptable_nat nf_nat xt_recent xt_multiport netlord(O) bridge bonding
ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4
nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables
iptable_filter ip_tables x_tables coretemp kvm_intel kvm
ghash_clmulni_intel 8021q aesni_intel cryptd hid_generic garp stp
gpio_ich igb aes_x86_64 usbhid i7core_edac llc hid serio_raw edac_core
mac_hid dca lpc_ich microcode ahci libahci lp parport shpchp hpsa
[15645.203323] CPU 23
[15645.203324] Modules linked in:
[15645.203326]  xt_conntrack nfnetlink xt_CT iptable_raw ipt_REDIRECT
veth ipmi_devintf ipmi_si ipmi_msghandler iptable_nat nf_nat xt_recent
xt_multiport netlord(O) bridge bonding ipt_REJECT xt_LOG xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables
x_tables coretemp kvm_intel kvm ghash_clmulni_intel 8021q aesni_intel
cryptd hid_generic garp stp gpio_ich igb aes_x86_64 usbhid i7core_edac
llc hid serio_raw edac_core mac_hid dca lpc_ich microcode ahci libahci
lp parport shpchp hpsa
[15645.203357]
[15645.203359] Pid: 5898, comm: ip Tainted: G           O
3.5.0-44-generic #67~precise1hf
[15645.203363] RIP: 0010:[<ffffffff8169ef29>]  [<ffffffff8169ef29>]
_raw_spin_unlock_irqrestore+0x19/0x30
[15645.203373] RSP: 0018:ffff88183fd63dd8  EFLAGS: 00000282
[15645.203375] RAX: 0000000000000282 RBX: 0000000000000000 RCX:
0000000000000400
[15645.203377] RDX: 0000000000000002 RSI: 0000000000000282 RDI:
0000000000000282
[15645.203378] RBP: ffff88183fd63de0 R08: 0000000000000000 R09:
0000000000000000
[15645.203380] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff88183fd63d48
[15645.203381] R13: ffffffff816a820a R14: ffff88183fd63de0 R15:
ffff88183fd739c0
[15645.203384] FS:  00007fdf0de76700(0000) GS:ffff88183fd60000(0000)
knlGS:0000000000000000
[15645.203385] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[15645.203387] CR2: 00000000040cfdb8 CR3: 0000001288f87000 CR4:
00000000000007e0
[15645.203389] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[15645.203391] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[15645.203393] Process ip (pid: 5898, threadinfo ffff880621126000, task
ffff88072f609700)
[15645.203394] Stack:
[15645.203396]  ffff8812b6d36500 ffff88183fd63e50 ffffffff8108f6d7
0000000000000017
[15645.203401]  0000000000000282 00000000000139c0 0000000000000017
0000000000000000
[15645.203405]  ffff8812b6d36570 ffff880979ef5380 00000000000139c0
0000000000000017
[15645.203410] Call Trace:
[15645.203411]  <IRQ>
[15645.203414]
[15645.203419]  [<ffffffff8108f6d7>] update_shares+0xc7/0x100
[15645.203423]  [<ffffffff81091daf>] rebalance_domains+0x4f/0x180
[15645.203426]  [<ffffffff81092078>] run_rebalance_domains+0x48/0x60
[15645.203433]  [<ffffffff8105bc78>] __do_softirq+0xa8/0x210
[15645.203439]  [<ffffffff810ab6c4>] ? tick_program_event+0x24/0x30
[15645.203443]  [<ffffffff816a8b5c>] call_softirq+0x1c/0x30
[15645.203449]  [<ffffffff81016235>] do_softirq+0x65/0xa0
[15645.203453]  [<ffffffff8105c05e>] irq_exit+0x8e/0xb0
[15645.203456]  [<ffffffff816a94be>] smp_apic_timer_interrupt+0x6e/0x99
[15645.203462]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.203463]  <EOI>
[15645.203465]
[15645.203468]  [<ffffffff8169ef29>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[15645.203474]  [<ffffffff81148b66>] free_percpu+0xa6/0x140
[15645.203478]  [<ffffffff811a71fe>] free_vfsmnt+0x2e/0x50
[15645.203482]  [<ffffffff811a7b8b>] mntput_no_expire+0xfb/0x160
[15645.203484]  [<ffffffff811a7c14>] mntput+0x24/0x40
[15645.203488]  [<ffffffff811a885b>] release_mounts+0x8b/0xa0
[15645.203491]  [<ffffffff811a8e4f>] do_umount+0x15f/0x250
[15645.203494]  [<ffffffff811a900a>] sys_umount+0xca/0xe0
[15645.203498]  [<ffffffff816a7769>] system_call_fastpath+0x16/0x1b
[15645.203499] Code: 66 90 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
00 55 48 89 e5 53 66 66 66 66 90 48 89 f3 e8 6e 1b 9a ff 66 90 48 89 df
57 9d <66> 66 90 66 90 5b 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
[15645.203532] Kernel panic - not syncing: softlockup: hung tasks
[15645.210089] Pid: 5898, comm: ip Tainted: G           O
3.5.0-44-generic #67~precise1hf1267535v20140117b1-Ubuntu
[15645.221437] Call Trace:
[15645.224200]  <IRQ>  [<ffffffff816862b2>] panic+0xc1/0x1d7
[15645.230290]  [<ffffffff810e1b57>] watchdog_timer_fn+0x177/0x180
[15645.236949]  [<ffffffff8107c3d8>] __run_hrtimer+0x78/0x1f0
[15645.243120]  [<ffffffff810e19e0>] ? __touch_watchdog+0x30/0x30
[15645.249680]  [<ffffffff8107cc67>] hrtimer_interrupt+0xf7/0x240
[15645.256240]  [<ffffffff816a94b9>] smp_apic_timer_interrupt+0x69/0x99
[15645.263382]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.270137]  [<ffffffff8169ef29>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[15645.277767]  [<ffffffff8108f6d7>] update_shares+0xc7/0x100
[15645.283937]  [<ffffffff81091daf>] rebalance_domains+0x4f/0x180
[15645.290496]  [<ffffffff81092078>] run_rebalance_domains+0x48/0x60
[15645.297347]  [<ffffffff8105bc78>] __do_softirq+0xa8/0x210
[15645.303420]  [<ffffffff810ab6c4>] ? tick_program_event+0x24/0x30
[15645.310172]  [<ffffffff816a8b5c>] call_softirq+0x1c/0x30
[15645.316148]  [<ffffffff81016235>] do_softirq+0x65/0xa0
[15645.321930]  [<ffffffff8105c05e>] irq_exit+0x8e/0xb0
[15645.327518]  [<ffffffff816a94be>] smp_apic_timer_interrupt+0x6e/0x99
[15645.334661]  [<ffffffff816a820a>] apic_timer_interrupt+0x6a/0x70
[15645.341411]  <EOI>  [<ffffffff8169ef29>] ?
_raw_spin_unlock_irqrestore+0x19/0x30
[15645.349763]  [<ffffffff81148b66>] free_percpu+0xa6/0x140
[15645.355738]  [<ffffffff811a71fe>] free_vfsmnt+0x2e/0x50
[15645.361617]  [<ffffffff811a7b8b>] mntput_no_expire+0xfb/0x160
[15645.368079]  [<ffffffff811a7c14>] mntput+0x24/0x40
[15645.373471]  [<ffffffff811a885b>] release_mounts+0x8b/0xa0
[15645.379642]  [<ffffffff811a8e4f>] do_umount+0x15f/0x250
[15645.385521]  [<ffffffff811a900a>] sys_umount+0xca/0xe0
[15645.391302]  [<ffffffff816a7769>] system_call_fastpath+0x16/0x1b

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux