Re: Connectivity problems with ISCSI target and ESXi server(s)

Charalampos Pournaris <charpour@xxxxxxxxx> · Wed, 21 May 2014 15:19:43 +0300

On Wed, May 21, 2014 at 1:01 PM, Charalampos Pournaris
<charpour@xxxxxxxxx> wrote:
> Hi Thomas,
>
> On Tue, May 20, 2014 at 10:44 PM, Thomas Glanzmann <thomas@xxxxxxxxxxxx> wrote:
>> Hello Harry,
>>
>>> http://pastebin.com/AqqJaYVX
>>
>> I checked my log from 11th October last year when this happened to me,
>> and for me it looks like the same error we're hitting:
>>
>> ...
>> Oct 11 11:53:56 node-62 kernel: [219465.151250] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5488
>> Oct 11 11:53:56 node-62 kernel: [219465.151261] ABORT_TASK: Found referenced iSCSI task_tag: 5494
>> Oct 11 11:53:56 node-62 kernel: [219465.151264] ABORT_TASK: ref_tag: 5494 already complete, skipping
>> Oct 11 11:53:56 node-62 kernel: [219465.151267] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5494
>> Oct 11 11:53:56 node-62 kernel: [219465.151271] ABORT_TASK: Found referenced iSCSI task_tag: 5495
>> Oct 11 11:53:56 node-62 kernel: [219465.151273] ABORT_TASK: ref_tag: 5495 already complete, skipping
>> Oct 11 11:53:56 node-62 kernel: [219465.151275] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5495
>> Oct 11 11:54:09 node-62 kernel: [219478.744212] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000008
>> Oct 11 11:54:09 node-62 kernel: [219478.751738] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5508
>> Oct 11 11:54:23 node-62 kernel: [219492.351282] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000013
>> Oct 11 11:54:23 node-62 kernel: [219492.358819] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 5514
>> Oct 11 11:54:23 node-62 kernel: [219492.630489] TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x0000001d
>> ...
>>
>> Full log here: https://thomas.glanzmann.de/crash/extracted_crash_dmesg.log
>>
>> I'll try to reproduce it and let the list know when I find something.
>> Harry, can you let me know how many VM's you had running and if they
>> were thin or thick provisioned and if they're idle or they had a heavy
>> workload.
>>
>> In my case I did _not_ use jumbo frames, but 802.3ad bonding. My
>> switches are configured to be able to deliver jumbo frames.
>>
>> Cheers,
>>         Thomas
>
> Indeed, it seems that we hit the same issue as the log lines look
> pretty similar.
>
> Our setup was comprised of around 7-10 VMs powered on with some
> activity (not too intense), and if I recall correctly some VM
> deployments (through OVF/OVA) had been made prior to the failure.
> Obviously, there are no discrete steps to reproduce the problem... I
> hope this can help reproduce the problem in your environment as it's
> kind of difficult for me to make changes in our production one (e.g.
> to recompile the kernel with debug on).
>
> Thanks!
>
> Regards,
> Harry

Forgot to mention here that the VMs were thin provisioned (most of
them, at least).

Regards,
Harry
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html