Hi Nicolas,
W dniu 07.08.2017 o 00:28, Nicholas A. Bellinger pisze:
Hi Lukasz,
(Adding Pascal CC', as he's been seeing something similar with MD
backend storage)
Thanks alot for the follow up. Comments below.
On Mon, 2017-07-31 at 17:15 +0200, Łukasz Engel wrote:
Hi Nicolas,
W dniu 31.07.2017 o 02:00, Nicholas A. Bellinger pisze:
[...]
Jul 31 16:28:54 teststorage [ 142.431045] ABORT_TASK: Found
referenced qla2xxx task_tag: 1189328
Jul 31 16:28:54 teststorage [ 142.431059] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1189328
Jul 31 16:28:57 teststorage [ 145.632571] ABORT_TASK: Found referenced qla2xxx task_tag: 1177932
Jul 31 16:28:57 teststorage [ 145.897299] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1177932
Jul 31 16:29:00 teststorage [ 148.388569] ABORT_TASK: Found referenced qla2xxx task_tag: 1164160
Jul 31 16:32:00 teststorage [ 328.848344] se_cmd waiting CDB: 0x89 tag: 1164160 t_state: 5 se_cmd_flags: 0x01c80109 transport_state: 0x00000033
Perfect. This is very helpful to understand state of the I/O triggering
ESX host ABORTs, and that blocks waiting for completion from backend
storage.
Here's what the above debug translates to:
CDB: 0x89 = COMPARE_AND_WRITE (eg: VAAI AtomicTestandSet)
t_state: 5 = TRANSPORT_PROCESSING
se_cmd_flags: 0x01c80109 =
SCF_SUPPORTED_SAM_OPCODE | SCF_SCSI_DATA_CDB | SCF_SE_LUN_CMD |
SCF_COMPARE_AND_WRITE | SCF_ACK_KREF | SCF_USE_CPUID | SCF_TASK_ATTR_SET
transport_state: 0x00000033 =
CMD_T_ABORTED | CMD_T_ACTIVE | CMD_T_SENT | CMD_T_STOP
Note COMPARE_AND_WRITE wrt to backend I/O consists of two phases. First
phase is a READ and comparison against the first part of the received
buffer. If the comparison matches, then the second phase sends a WRITE
with the second part of the received buffer to complete the update.
So according to the above output, the first phase of COMPARE_AND_WRITE
dispatched the READ into backend storage. The READ did not complete,
and subsequently ESX sent an ABORT_TASK for it (tag: 1164160). The
ABORT_TASK successfully found the I/O still outstanding to backend
storage, and blocks waiting for it to complete..
The debug printing 'se_cmd waiting ..' message shows every 180 seconds,
so it's clear I/O completion from backend storage is not happening back
to IBLOCK -> target-core for an extended amount of time.
Btw, I've been simulating this same scenario by delaying backend storage
I/O completion, and thus far once the first phase READ of
COMPARE_AND_WRITE eventually returns back to IBLOCK -> target-core, then
ABORT_TASK is able to complete and everything else continues to function
as expected.
That said, thus far the evidence is pointing to a backend device issue,
where that first READ phase of the COMPARE_AND_WRITE is (never..?)
completed back to IBLOCK -> target-core.
More below on how to verify this.
[...]
Btw, it looks like this system is running with:
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1
which triggers a crash as soon as any hung task is detected.
I'd recommend to set to this '0' for now, to monitor what's going on
while hung tasks are occurring.
Also, this can be set at boot time via hung_task_panic=0.
Yes, it was planned. Of course for checking "after hung-task" situation
I've disabled this now...
Also, some more details on what your backend device storage behind
target-core looks like would be helpful.
Physical HBAs, device information, and any MD/RAID/LVM/etc on top..
I have adaptec 7805 HW RAID (6x SAS 600G 15k hdd in RAID0) - exporting 1
device to system (sdc),
LVM sits on this device and LV (vgtest/test) is used as LIO block
backstore and is exported to ESXs.
(for system I have separate 2 SATA HDDs in software raid1 + LVM)
[...]
OK, the vgtest-test LVM is backed by a single SCSI device (sdc), there
is a simple test to see if the SCSI device is holding onto I/Os.
Once the hung task triggers, check out:
chead /sys/block/sdc/inflight
This will show how many I/Os are outstanding to the underlying SCSI
device.
IIRC, md and lvm devices don't use 'inflight' counters, so this is only
useful for verifying the number of I/Os outstanding to the underlying
SCSI device. So if it's possible, I'd recommend verifying with the raw
SCSI device with BLOCK export as well, to isolate if it's something
specific to LVM that's causing I/Os not to be completed.
Thanks again for your detailed bug-report.
I re-run my test without LVM (sdc directly exported as target's backstore).
Result is the same as with LVM - kworker blocked waiting for i/o
completion, with identical "se_cmd waiting" report (excluding tag value
of course). (My personal feeling is it took a little longer to block
than with lvm)
I checked inflight counters for sdc - There were some non-zero values
during i/o activity (as excected) and in final state (when "se_cmd
waiting ..." or hung task is reported) there are zeros there.
Tomorrow I plan to retest using different sas controller which I have on
shelf (LSI).
We will see if it's replicable in that configuration, but as Pascal got
this for quite different setup than mine I expect to got similar results.
Personally for me (If I properly understand your description) it looks
like in some situation i/o completion notification for read part of TAS
operation "got lost" somewhere in block/target layer. (And then timeout
for operation kicks in esx and it sends ABORT_TASK for this operation)
(BTW - I saw some non-zero values also for dm-* inflight counters during
i/o activity, so it looks lvm currently (4.12) provide inflight stats )
I will let you know tomorrow abount my findings
--
Łukasz Engel
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html