On Tue, 2014-06-10 at 13:17 +0300, Charalampos Pournaris wrote: > On Tue, Jun 3, 2014 at 10:37 PM, Nicholas A. Bellinger > <nab@xxxxxxxxxxxxxxx> wrote: > > On Sat, 2014-05-31 at 13:03 +0300, Charalampos Pournaris wrote: > >> On Sun, May 25, 2014 at 9:42 AM, Thomas Glanzmann <thomas@xxxxxxxxxxxx> wrote: <SNIP> > > The configuration looks fine. Thanks for including the extra info.. > > > >> > >> If for some reason the formatting is not displayed properly, or for > >> better readability check this screenshot: > >> http://postimg.org/image/e358ov40r/full/ > >> > >> Thank you in advance for your help. > >> > > > > Ok, so looking at these logs it's apparent that there are significantly > > fewer occurrences of ABORT_TASK. In fact, AFAICT there is only a single > > occurrence of ABORT_TASK in the entire log. > > > > This could be attributed to the reconfiguration to use a single LUN per > > endpoint, to avoid the false positive timeout issues that ESX is known > > to generate with multiple LUNs per TargetName+TargetPortalGroupTag > > endpoint.. > > > > However, looking at the single instance of ABORT_TASK in the log, > > something else appears to be happening with your backend: > > > > May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032235] Got Task Management Request ITT: 0x0027a82c, CmdSN: 0x3da72700, Function: 0x01, RefTaskTag: 0x0027a814, RefCmdSN: 0x25a72700, CID: 0 > > May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032266] ABORT_TASK: Found referenced iSCSI task_tag: 2598932 > > May 30 08:54:58 sof-24378-iscsi-vm kernel: [105260.032271] wait_for_tasks: Stopping ffff8800bac15810 ITT: 0x0027a814 i_state: 6, t_state: 5, CMD_T_STOP > > > > The most interesting line is the last one wrt to wait_for_tasks.. > > > > Decoded, these i_state and t_state values mean: > > > > i_state: 6 (ISTATE_RECEIVED_LAST_DATAOUT) > > t_state: 5 (TRANSPORT_PROCESSING) > > > > The significance of the 'TRANSPORT_PROCESSING' t_state means that an I/O > > request was dispatched to the backend (iblock/24378_iscsi), but the > > underlying storage never completes the outstanding I/O back to the > > target layer. Or at least, this occurrence of ABORT_TASK is right near > > the end of the logs, and there is no debug output to indicate the I/O > > completion ever occurs. > > > > This usually means some type of problem with the underlying driver for > > the backend storage, as there is not a legitimate why outstanding I/Os > > would not be (eventually) completed back to IBLOCK, be it with GOOD or > > some manner of exception status. > > > > So that said, I would start investigating the underlying LLD driver for > > iblock/24378_iscsi (/dev/sdb).. What type of storage + LLD is it > > using..? Is the HBA using the latest available firmware..? Is there > > anything else special about this backend..? > > > > --nab > > > > Hi Nicholas, > > First of all thanks for the detailed explanation. It seems that this > last problem that we've hit is different from the one reported > initially since it caused a kernel panic whereas the other issue makes > the datastore(s) inactive, constantly throwing a login error. By the > way, I'm using linux VMs (debian) to expose the iSCSI datastores (i.e. > /dev/sdb is a local drive) and thus it doesn't use any special > hardware/firmware. The only possible issue I can think of is the > vmware-tools driver might be the culprit for the incomplete I/O, or > some kernel driver? > > I've hit again the issue initially reported and have fresh logs to > share. I'll send a separate mail with a link to the new logs. > Additionally, when I attempted to stop the target service, it got stuck: > Ok, can you please apply the following three patches to your setup..? target: Set CMD_T_ACTIVE bit for Task Management Requests https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=f15e9cd910c4d9da7de43f2181f362082fc45f0f target: Use complete_all for se_cmd->t_transport_stop_comp https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=a95d6511303b848da45ee27b35018bb58087bdc6 iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=bbc050488525e1ab1194c27355f63c66814385b8 This address the bug where a backend I/O takes a long time (say over 120 seconds) to process, causing a ABORT_TASK + iscsi0session reset to occur before the backend I/O completes. A more detailed explanation is here: http://permalink.gmane.org/gmane.linux.scsi.target.devel/6489 However note, this addresses the case where the backend I/O takes a long time to complete, but it still *needs* to complete at some point. What I'm not sure about at this point is if your backend is just taking an extra long time to complete I/O, or if it's a separate bug in the LLD that causes I/O to never complete.. In any event, please try to reproduce with the above three places in place. --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html