once again about crash in fins_appropriate_src()/same_src()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

I would return to problem fixed by Michal Kubeček
 "[nf] netfilter: use RCU safe kfree for conntrack extensions"
https://bugzilla.netfilter.org/show_bug.cgi?id=714

I'm not sure that problem was caused by RCU race and I afraid it can be still not fixed.

Parallels got similar crash on RHEL6.u4-based kernel.
I've checked vmcore dump and found exactly the same situation:
we have zeroed struct nf_conn_nat in nat_bysource hlist, and crashed becasue nat->ct = NULL.

Our kernel did not included patch from Michal,
however I've found several strange facts:
- I've found old alive conntrack that have reference to extension with nf_conn_nat,
 conntrack have strange status 0xe. I.e. have NOT set any NAT bits, but have CONFIRMED bit set
- our entry seems was not removed by RCU, all references in nat_bysource hlist looks correct
- last unloaded module was nf_nat -- but it is loaded back

I.e. it does not look like problem was caused by RCU race.

So I would like to ask, are you sure that problem in your cases was really caused by RCU race?
Did you tried to find conntrack used affected nf_conn_nat?
Did you observed probably conntrack destroy on neighbor CPUs?

I believe that nf_conn_nat could be zeroyed during nf_nat module unloading, 
but I do not understand how this entry could return back into nat_byource list after module reloading.
Do you probably have some ideas?

You can find some more details about my case below.

<1>[6972396.347616] BUG: unable to handle kernel NULL pointer dereference at 000000000000003e
<1>[6972396.347733] IP: [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.347802] PGD 0
<4>[6972396.347859] Oops: 0000 [#1] SMP
<4>[6972396.347920] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.5/0000:06:00.0/net/eth1/statistics/tx_bytes
<4>[6972396.348041] CPU 0
<4>[6972396.348048] Modules linked in: ipt_REDIRECT nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat iptable_mangle ipt_LOG xt_DSCP 8021q garp stp llc cls_fw sch_cbq ip_vzredir(U) vzredir(U) vzcompat(U) nf_conntrack_ipv4
nf_defrag_ipv4 vznetdev ip6_vzprivnet(U) ip6_vzredir(U) ip6_vznetstat(U) ip_vzprivnet(U) vziolimit(U) vzfs(U) vzdquota vzcpt nfs lockd fscache auth_rpcgss nfs_acl sunrpc vzlinkdev(U) vzethdev vzevent vzlist(U) vzstat(U) vzmon ip_vznet
stat(U) vznetstat(U) vzdev xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg mic
rocode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e i5100_edac edac_core ioatdma dca shpchp ext3 jbd mbcache sr_mod cdrom pata_it8213 sd_mod crc_t10dif pata_acpi ata_generic ata_piix aacraid dm_mirror dm_region_hash
 dm_log dm_mod [last unloaded: nf_nat]
<4>[6972396.348586]
<4>[6972396.348586] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-042stab083.2 #1 042stab083_2 Supermicro X7DCL/X7DCL
<4>[6972396.348586] RIP: 0010:[<ffffffffa0243fa0>]  [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.348586] RSP: 0018:ffff880028203700  EFLAGS: 00010286
<4>[6972396.348586] RAX: 0000000000000000 RBX: ffff8802d89c9b58 RCX: 000000005ef7b2de
<4>[6972396.348586] RDX: 0000000000000001 RSI: ffff8800282037e0 RDI: ffff880028203740
<4>[6972396.348586] RBP: ffff8800282037d0 R08: 00000000c0deecc2 R09: 0000000000000002
<4>[6972396.348586] R10: ffff8801345e5508 R11: ffff880028203918 R12: 0000000000000000
<4>[6972396.348586] R13: ffff8800282037e0 R14: ffff880028203740 R15: ffff8800c350a5f0
<4>[6972396.348586] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>[6972396.348586] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>[6972396.348586] CR2: 000000000000003e CR3: 00000002b6ada000 CR4: 00000000000407f0
<4>[6972396.348586] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[6972396.348586] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[6972396.348586] Process swapper (pid: 0, veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
<4>[6972396.348586] Stack:
<4>[6972396.348586]  ffff880028203740 ffffffff81b1a520 000000b200b20000 ffff88031a6606e0
<4>[6972396.348586] <d> 0000000028203740 ffff880000000008 ffffffff81299af0 ffff8800851e61c0
<4>[6972396.348586] <d> 00000000f7e6a8c0 0000000000000000 16e6a8c00002873a 0000000000000000
<4>[6972396.348586] Call Trace:
<4>[6972396.348586]  <IRQ>
<4>[6972396.348586]  [<ffffffff81299af0>] ? swiotlb_map_page+0x0/0xf0
<4>[6972396.348586]  [<ffffffffa025e152>] masquerade_tg+0xd2/0xf0 [ipt_MASQUERADE]
<4>[6972396.348586]  [<ffffffffa025232b>] ipt_do_table+0x3fb/0x6b0 [ip_tables]
<4>[6972396.348586]  [<ffffffff8145c419>] ? dev_queue_xmit+0x169/0x560
<4>[6972396.348586]  [<ffffffff8144c8a6>] ? skb_checksum+0x56/0x2d0
<4>[6972396.348586]  [<ffffffffa0255284>] nf_nat_rule_find+0x24/0x80 [iptable_nat]
<4>[6972396.348586]  [<ffffffffa0255521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[6972396.348586]  [<ffffffffa0255798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[6972396.348586]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[6972396.348586]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[6972396.348586]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[6972396.348586]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[6972396.348586]  [<ffffffff81490590>] ? ip_forward_finish+0x0/0x60
<4>[6972396.348586]  [<ffffffff81495204>] ip_output+0xa4/0xc0
<4>[6972396.348586]  [<ffffffff814905df>] ip_forward_finish+0x4f/0x60
<4>[6972396.348586]  [<ffffffff81490788>] ip_forward+0x198/0x430
<4>[6972396.348586]  [<ffffffff8148e8ed>] ip_rcv_finish+0x12d/0x440
<4>[6972396.348586]  [<ffffffff8148ee84>] ip_rcv+0x284/0x360
<4>[6972396.348586]  [<ffffffff8145651b>] __netif_receive_skb+0x34b/0x7b0
<4>[6972396.348586]  [<ffffffff81458af8>] netif_receive_skb+0x58/0x60
<4>[6972396.348586]  [<ffffffff81458c00>] napi_skb_finish+0x50/0x70
<4>[6972396.348586]  [<ffffffff8145ae69>] napi_gro_receive+0x39/0x50
<4>[6972396.348586]  [<ffffffffa013679f>] e1000_receive_skb+0x4f/0x90 [e1000e]
<4>[6972396.348586]  [<ffffffffa0138d30>] e1000_clean_rx_irq+0x380/0x580 [e1000e]
<4>[6972396.348586]  [<ffffffffa01412ed>] e1000e_poll+0x8d/0x37c [e1000e]
<4>[6972396.348586]  [<ffffffff8145af90>] net_rx_action+0x110/0x300
<4>[6972396.348586]  [<ffffffff810a2265>] ? __hrtimer_start_range_ns+0x1a5/0x470
<4>[6972396.348586]  [<ffffffff8107ac8d>] __do_softirq+0x10d/0x250
<4>[6972396.348586]  [<ffffffff810a84f9>] ? do_timer+0x329/0x590
<4>[6972396.348586]  [<ffffffff8100c2ac>] call_softirq+0x1c/0x30
<4>[6972396.348586]  [<ffffffff8100df05>] do_softirq+0x65/0xa0
<4>[6972396.348586]  [<ffffffff8107aaad>] irq_exit+0xcd/0xd0
<4>[6972396.348586]  [<ffffffff81520465>] do_IRQ+0x75/0xf0
<4>[6972396.348586]  [<ffffffff8100ba93>] ret_from_intr+0x0/0x11
<4>[6972396.348586]  <EOI>
<4>[6972396.348586]  [<ffffffff81014967>] ? mwait_idle+0x77/0xd0
<4>[6972396.348586]  [<ffffffff8151c9fa>] ? atomic_notifier_call_chain+0x1a/0x20
<4>[6972396.348586]  [<ffffffff8100a026>] cpu_idle+0xb6/0x110
<4>[6972396.348586]  [<ffffffff814fc5e5>] rest_init+0x85/0x90
<4>[6972396.348586]  [<ffffffff81c32f6e>] start_kernel+0x412/0x41e
<4>[6972396.348586]  [<ffffffff81c3233a>] x86_64_start_reservations+0x125/0x129
<4>[6972396.348586]  [<ffffffff81c32438>] x86_64_start_kernel+0xfa/0x109
<4>[6972396.348586] Code: a0 02 00 00 4c 8b 3e 4d 85 ff 75 1b e9 10 01 00 00 66 2e 0f 1f 84 00 00 00 00 00 4d 8b 3f 4d 85 ff 0f 84 84 00 00 00 49 8b 47 20 <38> 50 3e 75 eb 8b 48 18 3b 4d a0 75 e3 0f b7 48 28 66 3b 4d b0
<1>[6972396.348586] RIP  [<ffffffffa0243fa0>] nf_nat_setup_info+0x460/0x590 [nf_nat]
<4>[6972396.348586]  RSP <ffff880028203700>
<4>[6972396.348586] CR2: 000000000000003e

find_appropriate_src()
{       
        hlist_for_each_entry_rcu(nat, n, &net->ipv4.nat_bysource[h], bysource) {
                ct = nat->ct;            ct = NULL
                if (same_src(ct, tuple))  <<<< crashed on access to ct

nat = %r15 = ffff8800c350a5f0

crash> struct nf_conn_nat ffff8800c350a5f0
struct nf_conn_nat {
  bysource = {
    next = 0x0,
    pprev = 0xffff88001a35cb70
  },
  seq = {{
      correction_pos = 0,
      offset_before = 0,
      offset_after = 0
    }, {
      correction_pos = 0,
      offset_before = 0,
      offset_after = 0
    }},
  ct = 0x0,
  help = {
    nat_pptp_info = {
      pns_call_id = 0,
      pac_call_id = 0
    }
  },
  masq_index = 0
}

according nf_ct_ext is used by ct ffff880212e81d50

crash> kmem ffff8800c350a5f0
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88031fc00140 size-128                 128     183495    506280  16876     4k
FREE / [ALLOCATED]
  [ffff8800c350a5c0]   <<<<< nf_conn_nat is part of nf_ct_ext sreucture

crash> search ffff8800c350a5c0
ffff880212e81e70: ffff8800c350a5c0  <<< someone have reference to our nf_ct_ext

crash> kmem ffff880212e81e70
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff88031dc00380 nf_conntrack_ffffffff81b1a520 312   799      1944    162     4k
FREE / [ALLOCATED]
  [ffff880212e81d50] <<< it is alive conntrack
VvS: name of conntrack slab is correct, it is OpenVz specific, each network namespace have own slab for conntracks.

crash> nf_conn ffff880212e81d50
struct nf_conn
      counter = 0x1
  src: x.x.x.x, port 59116
  dst: y.y.y.y, port 24000
  status = 0xe,  no NAT and NAT_DONE bits (?!?), set only IPS_SEEN_REPLY, IPS_ASSURED, IPS_CONFIRMED.
  master = 0
  timer -- will be expired in future
  proto tcp -- seems there was some activity in this connection in past
      tcp = {
      seen = {{
          td_end = 0x221b933, 
          td_maxend = 0x221f23b, 
          td_maxwin = 0xf7d3, 
          td_maxack = 0xdc9812bb, 
          td_scale = 0x6, 
          flags = 0x23
        }, {
          td_end = 0xdc9812bb, 
          td_maxend = 0xdc986dfb, 
          td_maxwin = 0x3908, 
          td_maxack = 0x221b933, 
          td_scale = 0x7, 
          flags = 0x23
        }}, 
      state = 0x3,  <<< TCP_CONNTRACK_ESTABLISHED
      last_dir = 0x1, 
      retrans = 0x0, 
      last_index = 0x1, 
      last_seq = 0xdc9812ba, 
      last_ack = 0xdc9812bb, 
      last_end = 0xdc9812bb, 
      last_win = 0x3908
    },
  ext = 0xffff8800c350a5c0 -- reference to extension


crash> nf_ct_ext 0xffff8800c350a5c0
struct nf_ct_ext {
  rcu = {
    next = 0x0,
    func = 0x0
  },
  offset = "\000\060\000\030",  <<<<< \060 =0x30 --> nat extension was used
  len = 0x60,
  data = 0xffff8800c350a5d5 ""
}

hlist looks normal
crash> nf_conn_nat.bysource 0xffff8800c350a5f0
  bysource = {
    next = 0x0,
    pprev = 0xffff88001a35cb70
  }
crash> nf_conn_nat.bysource 0xffff88001a35cb70
  bysource = {
    next = 0xffff8800c350a5f0,
    pprev = 0xffff88031b2c8a08
  }
crash> hlist_head 0xffff88031b2c8a08  <<<<< hlist_head from init_net.ipv4.nat_bysource
struct hlist_head {
  first = 0xffff88001a35cb70
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux