Re: Endless wait in transport_clear_lun_from_sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2011-09-16 at 15:16 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2011-09-16 at 21:39 +0200, Martin Svec wrote:
> > Hello,
> > 
> > I tried my 2.6.39 LIO stress tests with the latest mainline 3.1-rc6+ code and
> > with one of the tests, I sometimes hit an endless wait in transport_clear_lun_from_sessions()
> > coupled with a wait in transport_clear_lun_thread().
> > 
> > The affected test repeatedly adds+enables and disables+removes a TPG while an initiator
> > is connected to it and performs heavy sequential writes using fio. The problem occurs during
> > TPG removal in unlink("/sys/kernel/config/target/iscsi/<target-iqn>/<tpgt#>/lun/<lun#>/<dev-symlink>").
> > When entering the unlink(), TPG is already disabled by "echo 0 > .../enable" and all initiator
> > ACLs are removed. It does not happen every time but I can almost always reproduce it within
> > 10-20 minutes.
> > 
> > Tested on bleeding edge mainline 3.1-rc6+ with today's target patches for -rc7.
> > 
> 
> Hi Martin,
> 
> Thanks for the detailed bug-report.  It sounds like you are pretty
> certain this is regression with the v3.1 target-core+iscsi-target
> mainline code, yes..?  Have you been able to verify this same test with
> the pre v3.1 (eg: v4.0.0-rc7) code in lio-core-2.6.git..?
> 
> Nothing immediately rings a bell here, but i'll have a deeper look and
> try reproducing over the weekend..  Also, it would be very helpful if
> you could enable the dynamic_printk right before the shutdown sequence
> with:
> 
>    echo 'module iscsi_target_mod +p' > /debug/dynamic_debug/control
> 
> and disable it before re-enabling the TPG to avoid all of the WRITE I/O
> output noise with:
> 
>    echo 'module iscsi_target_mod -p' > /debug/dynamic_debug/control
> 
> This will require CONFIG_DYNAMIC_DEBUG=y in your kernel config, but
> would be very helpful to diagnose the issue. 
> 

Hi again Martin,

Thinking a bit about the possibilities here, but am still unsure what's
actually goong on based on the initial logs.  One thing that is very
strange is the fact that by default the TPG being explictly disabled is
supposed to shutdown all active iscsi sessions and prevent new sessions
from performing login.

I am wondering if this particular issue is either caused by a left-over
se_cmd being processed in transport_lun_wait_for_tasks() after all
sessions have been stopped (meaning there is a bug in the iscsi-target
shutdown process for writes / removal of se_cmd from the lun_cmd_list),
or if there is a new bug where active session I/O is occuring after the
TPG endpoint has been explict disabled..

Along with the full set of dynamic_debug output as mentioned above, it
would be very helpful to have a seperate output of the hang with only
the convertion pr_debug() into printks() within
transport_lun_wait_for_tasks() and __transport_clear_lun_from_sessions()
logic.  Please do the same for transport_generic_wait_for_tasks(), which
should give a better insight into whats going on minus the full set of
verbose active I/O output..

Also, a bracktrace for any active session iscsi_ttx and iscsi_trx thread
pairs (possibly in D state) via using kdb when the hang occurs would
also be very useful.

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux