Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 11 Nov 2010 16:58:39 -0800

On Thu, 2010-11-11 at 14:57 -0800, Patil, Kiran wrote:
> Yes, transport_generic_handle_data which is called from ft_recv_write_data can do msleep_interruptible only if transport is active.
> 
> FYI, this msleep was not introduced by my patch, it has been there.
> 
> Agree with Joe's both suggestion (fcoe_rcv - always let it go to processing thread and TCM should not block per CPU receive thread). Will let Nick comment on that.
> 

Hey guys,

So the split for interrupt context setup of individual se_cmd
descriptors for TCM_Loop (and other WIP HW FC target mode drivers) is to
use the optional target_core_fabric_ops->new_cmd_map() for the pieces of
se_cmd setup logic that are currently not done in interrupt context.
For TCM_Loop this is currently:

*) transport_generic_allocate_tasks() (access of lun, PR and ALUA 
        specifics locks currently using spin_lock() + spin_unlock()
*) transport_generic_map_mem_to_cmd() using GFP_KERNEL allocations

However for this specific transport_generic_handle_data() case:

        /*
         * Make sure that the transport has been disabled by
         * transport_write_pending() before readding this struct se_cmd to the
         * processing queue.  If it has not yet been reset to zero by the
         * processing thread in transport_add_cmd_to_queue(), let other
         * processes run.  If a signal was received, then we assume the
         * connection is being failed/shutdown, so we return a failure.
         */
        while (atomic_read(&T_TASK(cmd)->t_transport_active)) {
                msleep_interruptible(10);
                if (signal_pending(current))
                        return -1;
        }

is specific for existing drivers/target/lio-target iSCSI code, which need this for
traditional kernel sockets recv side iSCSI WRITE case.

Since we have already have FCP write data ready for submission to
backend devices at this point, I think we want something in the
transport_generic_new_cmd() -> transport_generic_write_pending() code
that does the immediate SCSI write submission and skips the
TFO->write_pending() callback / extra fabric API exchange/response..  

Here is how TCM_loop is currently doing that with SCSI WRITE data mapped
from incoming ->queuecommand() cmd->table.sgl memory:

int tcm_loop_write_pending(struct se_cmd *se_cmd)
{
        /*
         * Since Linux/SCSI has already sent down a struct scsi_cmnd
         * sc->sc_data_direction of DMA_TO_DEVICE with struct scatterlist array
         * memory, and memory has already been mapped to struct se_cmd->t_mem_list
         * format with transport_generic_map_mem_to_cmd().
         *
         * We now tell TCM to add this WRITE CDB directly into the TCM storage
         * object execution queue.
         */
        transport_generic_process_write(se_cmd);
        return 0;
}

This will skip the transport_check_aborted_status() in
transport_generic_handle_data(), and immediately add the
T_TASK(cmd)->t_task_list for se_task execution down to
se_subsystem_api->do_task() and out to backend subsystem code.

So just to reiterate the point with current v4.0 code, we currently
cannot safely call transport_generic_allocate_tasks() or
transport_generic_map_mem_to_cmd() from interrupt context, so you want
to do these calls using TFO->new_cmd_map() callback in the backend
kernel thread process context..  

So I think this means you want to call transport_generic_process_write()
to immediate queue the WRITE from TFO->write_pending(), but not very
certain after looking at ft_write_pending().

Joe, any thoughts here..?

Best,

--nab

> Thanks,
> -- Kiran P.
> 
> -----Original Message-----
> From: devel-bounces@xxxxxxxxxxxxx [mailto:devel-bounces@xxxxxxxxxxxxx] On Behalf Of Joe Eykholt
> Sent: Thursday, November 11, 2010 11:52 AM
> To: Jansen, Frank
> Cc: devel@xxxxxxxxxxxxx
> Subject: Re: [Open-FCoE] transport_generic_handle_data - BUG: scheduling while atomic
> 
> 
> 
> On 11/11/10 11:41 AM, Jansen, Frank wrote:
> > Greetings!
> > 
> > I'm running 2.6.36 with Kiran Patil's patches from 10/28/10.
> > 
> > I have 4 logical volumes configured over fcoe:
> > 
> > [root@dut ~]# tcm_node --listhbas
> > \------> iblock_0
> >        HBA Index: 1 plugin: iblock version: v4.0.0-rc5
> >        \-------> r0_lun3
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-4  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l3
> >        Major: 253 Minor: 4  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l3
> >        \-------> r0_lun2
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-3  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l2
> >        Major: 253 Minor: 3  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l2
> >        \-------> r0_lun1
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-2  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l1
> >        Major: 253 Minor: 2  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l1
> >        \-------> r0_lun0
> >        Status: ACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
> > SectorSize: 512  MaxSectors: 1024
> >        iBlock device: dm-1  UDEV PATH: /dev/vg_R0_p1/lv_R0_p1_l0
> >        Major: 253 Minor: 1  CLAIMED: IBLOCK
> >        udev_path: /dev/vg_R0_p1/lv_R0_p1_l0
> > 
> > When any significant I/O load is put on any of the devices, I receive
> > a flood of the following messages:
> > 
> >> Nov 11 13:46:09 dut kernel: BUG: scheduling while atomic:
> >> LIO_iblock/4439/0x00000101
> >> Nov 11 13:46:09 dut kernel: Modules linked in: fcoe libfcoe
> >> target_core_stgt target_core_pscsi target_core_file target_core_iblock
> >> ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 tcm_fc libfc
> >> scsi_transport_fc scsi_tgt target_core_mod configfs sunrpc ipv6
> >> dm_mirror dm_region_hash dm_log kvm_intel kvm uinput ixgbe ioatdma
> >> iTCO_wdt ses enclosure i2c_i801 i2c_core iTCO_vendor_support mdio sg
> >> igb dca pcspkr evbug evdev ext4 mbcache jbd2 sd_mod crc_t10dif
> >> pata_acpi ata_generic mpt2sas scsi_transport_sas ata_piix raid_class
> >> dm_mod [last unloaded: speedstep_lib]
> >> Nov 11 13:46:09 dut kernel: Pid: 4439, comm: LIO_iblock Not tainted
> >> 2.6.36+ #1
> >> Nov 11 13:46:09 dut kernel: Call Trace:
> >> Nov 11 13:46:09 dut kernel: <IRQ>  [<ffffffff8104fb96>]
> >> __schedule_bug+0x66/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8149779c>] schedule+0xa2c/0xa60
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497d73>]
> >> schedule_timeout+0x173/0x2e0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81071200>] ?
> >> process_timeout+0x0/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81497f3e>]
> >> schedule_timeout_interruptible+0x1e/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81072b39>]
> >> msleep_interruptible+0x39/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa033ebfa>]
> >> transport_generic_handle_data+0x2a/0x80 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c33ee>]
> >> ft_recv_write_data+0x1fe/0x2b0 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c13cb>] ft_recv_seq+0x8b/0xc0
> >> [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a0e1f>]
> >> fc_exch_recv+0x61f/0xe20 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c1123>] ?
> >> skb_copy_bits+0x63/0x2c0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c15ea>] ?
> >> __pskb_pull_tail+0x26a/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015b86d>]
> >> fcoe_recv_frame+0x18d/0x340 [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c13df>] ?
> >> __pskb_pull_tail+0x5f/0x360
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015e52a>] fcoe_rcv+0x2aa/0x44c
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8113c897>] ?
> >> __kmalloc_node_track_caller+0x67/0xe0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813c0404>] ?
> >> __netdev_alloc_skb+0x24/0x50
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cd39a>]
> >> __netif_receive_skb+0x41a/0x5d0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81012699>] ? read_tsc+0x9/0x20
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813ceab8>]
> >> netif_receive_skb+0x58/0x80
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cec20>]
> >> napi_skb_finish+0x50/0x70
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf1a5>]
> >> napi_gro_receive+0xc5/0xd0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0207a1f>]
> >> ixgbe_clean_rx_irq+0x31f/0x840 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa02083a6>]
> >> ixgbe_clean_rxtx_many+0x136/0x240 [ixgbe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cf382>]
> >> net_rx_action+0x102/0x250
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068af2>]
> >> __do_softirq+0xb2/0x240
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100c07c>] call_softirq+0x1c/0x30
> >> Nov 11 13:46:09 dut kernel: <EOI>  [<ffffffff8100db25>] ?
> >> do_softirq+0x65/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81068664>]
> >> local_bh_enable+0x94/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff813cdfd3>]
> >> dev_queue_xmit+0x143/0x3b0
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa015d96e>] fcoe_xmit+0x30e/0x520
> >> [fcoe]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03a2a13>] ?
> >> _fc_frame_alloc+0x33/0x90 [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa039f904>] fc_seq_send+0xb4/0x140
> >> [libfc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03c1722>]
> >> ft_write_pending+0x112/0x160 [tcm_fc]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347800>]
> >> transport_generic_new_cmd+0x280/0x2b0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa03479d4>]
> >> transport_processing_thread+0x1a4/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff810835d0>] ?
> >> autoremove_wake_function+0x0/0x40
> >> Nov 11 13:46:09 dut kernel: [<ffffffffa0347830>] ?
> >> transport_processing_thread+0x0/0x7c0 [target_core_mod]
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082f36>] kthread+0x96/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf84>]
> >> kernel_thread_helper+0x4/0x10
> >> Nov 11 13:46:09 dut kernel: [<ffffffff81082ea0>] ? kthread+0x0/0xa0
> >> Nov 11 13:46:09 dut kernel: [<ffffffff8100bf80>] ?
> >> kernel_thread_helper+0x0/0x10
> > 
> > I started noticing these issues first when I ran I/O with larger
> > filesizes (appr. 25GB), but I'm thinking that might be a red herring.
> > I'll rebuild the kernel and tools to make sure nothing is out of sorts
> > and will report on any additional findings.
> > 
> > Thanks,
> > 
> > Frank
> 
> FCP data frames are coming in at the interrupt level, and TCM expects
> to be called in a thread or non-interrupt context, since
> transport_generic_handle_data() may sleep.
> 
> A quick workaround would be to change the fast path in fcoe_rcv() so that
> data always goes through the per-cpu receive threads.   That avoids part of the
> problem, but isn't anything like the right fix.  It doesn't seem good to
> let TCM block FCoE's per-cpu receive thread either.
> 
> Here's a quick change if you want to just work around the problem.
> I haven't tested it:
> 
> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
> index feddb53..8f854cd 100644
> --- a/drivers/scsi/fcoe/fcoe.c
> +++ b/drivers/scsi/fcoe/fcoe.c
> @@ -1285,6 +1285,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device *netdev,
>  	 * BLOCK softirq context.
>  	 */
>  	if (fh->fh_type == FC_TYPE_FCP &&
> +	    0 &&
>  	    cpu == smp_processor_id() &&
>  	    skb_queue_empty(&fps->fcoe_rx_list)) {
>  		spin_unlock_bh(&fps->fcoe_rx_list.lock);
> 
> ---
> 
> 	Cheers,
> 	Joe
> 
> 
> 
> 
> _______________________________________________
> devel mailing list
> devel@xxxxxxxxxxxxx
> http://www.open-fcoe.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@xxxxxxxxxxxxx
> http://www.open-fcoe.org/mailman/listinfo/devel

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html