Re: [PATCH 01/10] fcoe: avoid getting into dev_queue_xmit with bottom halves disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 11, 2012 at 08:25:30PM -0700, Bhanu Prakash Gollapudi wrote:
> On 3/11/2012 10:43 AM, Neil Horman wrote:
> >On Fri, Mar 09, 2012 at 03:50:44PM -0800, Bhanu Prakash Gollapudi wrote:
> >>On 3/9/2012 2:49 PM, Robert Love wrote:
> >>>From: Neil Horman<nhorman@xxxxxxxxxxxxx>
> >>>
> >>>Recieved this warning recently:
> >>>
> >>>WARNING: at kernel/softirq.c:143 local_bh_enable+0x7d/0xb0() (Not tainted)
> >>>Hardware name: C6100
> >>>Modules linked in: autofs4 bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc
> >>>scsi_tgt 8021q garp stp llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table
> >>>mperf ipv6 ixgbe mdio igb dcdbas microcode i2c_i801 i2c_core sg iTCO_wdt
> >>>iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2
> >>>sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> >>>scsi_wait_scan]
> >>>Pid: 4926, comm: fc_rport_eq Not tainted 2.6.32 #2
> >>>Call Trace:
> >>>  [<ffffffff81069c67>] ? warn_slowpath_common+0x87/0xc0
> >>>  [<ffffffff81069cba>] ? warn_slowpath_null+0x1a/0x20
> >>>  [<ffffffff8107216d>] ? local_bh_enable+0x7d/0xb0
> >>>  [<ffffffff81433567>] ? dev_queue_xmit+0x177/0x6c0
> >>>  [<ffffffffa0293a34>] ? vlan_dev_hwaccel_hard_start_xmit+0x84/0xb0 [8021q]
> >>>  [<ffffffff8142f46c>] ? dev_hard_start_xmit+0x2bc/0x3f0
> >>>  [<ffffffff8143392e>] ? dev_queue_xmit+0x53e/0x6c0
> >>>  [<ffffffff8142348e>] ? __skb_clone+0x2e/0x120
> >>>  [<ffffffffa02ea83d>] ? fcoe_clean_pending_queue+0xcd/0x100 [libfcoe]
> >>>  [<ffffffff810559ca>] ? find_busiest_group+0x9aa/0xb20
> >>>  [<ffffffffa02f7624>] ? fcoe_port_send+0x24/0x50 [fcoe]
> >>>  [<ffffffffa02f9568>] ? fcoe_xmit+0x228/0x3e0 [fcoe]
> >>>  [<ffffffffa02c1896>] ? fc_seq_send+0xb6/0x160 [libfc]
> >>>  [<ffffffffa02c1a76>] ? fc_exch_abort_locked+0x136/0x1c0 [libfc]
> >>>  [<ffffffffa02c1ca7>] ? fc_exch_mgr_reset+0x147/0x2a0 [libfc]
> >>>  [<ffffffff814f2c7e>] ? _spin_unlock_irq+0xe/0x20
> >>>  [<ffffffffa02c91d3>] ? fc_rport_work+0x123/0x5f0 [libfc]
> >>>  [<ffffffffa02c90b0>] ? fc_rport_work+0x0/0x5f0 [libfc]
> >>>  [<ffffffff8108b559>] ? worker_thread+0x179/0x2a0
> >>>  [<ffffffff81090ea0>] ? autoremove_wake_function+0x0/0x40
> >>>  [<ffffffff8108b3e0>] ? worker_thread+0x0/0x2a0
> >>>  [<ffffffff81090b56>] ? kthread+0x96/0xa0
> >>>  [<ffffffff8100c10a>] ? child_rip+0xa/0x20
> >>>  [<ffffffff81090ac0>] ? kthread+0x0/0xa0
> >>>  [<ffffffff8100c100>] ? child_rip+0x0/0x20
> >>>
> >>>fc_exch_abort_locked attempts to send an abort frame, but it does so with a
> >>>spinlock held and irqs/bh's disabled.  Because of this dev_queue_xmit issues a
> >>>warning. This patch allows the fc_exch_abort_locked to drop and re-acquire the
> >>>lock so that the warning is suppressed.
> >>>
> >>>Note this isn't a great fix, given that we need to free and re-acquire the
> >>>ex_lock during the fc_exch_abort_locked operation, but any other solution is
> >>>going to be a fairly major undertaking I think.  This should work until a proper
> >>>fix can be found.
> >>>
> >>>Signed-off-by: Neil Horman<nhorman@xxxxxxxxxxxxx>
> >>>Signed-off-by: Robert Love<robert.w.love@xxxxxxxxx>
> >>
> >>Robert,  This patch is not required based on what we discussed in
> >>this thread..
> >>http://lists.open-fcoe.org/pipermail/devel/2012-February/011946.html
> >>
> >>Neil provided "fcoe: Ensure fcoe_recv_frame is always called in
> >>process context" instead of the two patches he submitted initially.
> >>http://lists.open-fcoe.org/pipermail/devel/2012-February/011962.html
> >>
> >>Neil, correct me if I'm wrong.
> >>
> >Bhanu, This is not quite right (although that was a very confusing thread).  The
> >ema_list fix that I was proposing is no longer necessecary, because after that
> >conversation we realized the whole issue cold be avoided by making sure that we
> >never allowed fcoe_recv_frame to be called in process context.
> >
> >This patch is still necessecary.
> 
> 
> Neil, The reason you saw this (WARNING) stack trace (in softirq.c)
> was because of using spin_lock_irqsave() instead of spin_lock_bh()
> in fc_exch_reset() as part of  "[PATCH 1/2] fcoe: provide exclusion
> to concurrent ema_list access", which was not eventually submitted.
> 
> Because of using spin_lock_irqsave(), the following WARN_ON is hit
> in _local_bh_enable_ip().
> 
> static inline void _local_bh_enable_ip(unsigned long ip)
> {
>          WARN_ON_ONCE(in_irq() || irqs_disabled());
> 
> I don't think you will see this WARNING with the existing code.
> 
> 
>  Its needed beacuse fc_exch_abort_locked is (by
> >definition) called with the ex_lock held.  fc_exch_abort_locked however calls
> >fc_seq_send, which can lock the same ex_lock, causing deadlock.
> 
> 
> This is NOT an issue. Although fc_seq_send takes ex_lock,
> fc_exch_abort_locked() code path does not come here, as we return
> from this function if fh_type is BLS.
> 
>         if (fh_type == FC_TYPE_BLS)
>                 return error;
> 
>         /*
>          * Update the exchange and sequence flags,
>          * assuming all frames for the sequence have been sent.
>          * We can only be called to send once for each sequence.
>          */
>         spin_lock_bh(&ep->ex_lock);
> 	...
> 
You're right, I apologize.  The warning was erroneus, but I was thinking this
was actually a fix for a deadlock, but I'd forgotten about the fh_type check
that prevented the double lock aquisition.

Rob, this can in fact just be dropped.  Thanks!
Neil

> 
> Thanks,
> Bhanu
> >
> >So yes, we need a fix for this.  That said, I'd just as soon you drop this
> >patch, as I was looking at this code and realized that I can make a much more
> >efficient fix for this problem.  So let me know how you want to handle this: If
> >you want to drop this patch, I'll have an improved fix for you early this week.
> >If you want to keep this patch in place, thats fine with me too.  My new fix
> >will just revert these bits.  Let me know how you want to play it.
> >
> >Thank you for bringing this up though, I'd been meaning to review this and see
> >if I could fix it a bit better.
> >
> >Best
> >Neil
> >
> >
> >
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux