Re: ESXi snapshot I/O error after upgrade to 4.9.30

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Nic,

Dne 9.6.2017 v 7:21 Nicholas A. Bellinger napsal(a):
> Hi Martin,
>
> On Mon, 2017-06-05 at 18:05 +0200, Martin Svec wrote:
>> Hello Nic,
>>
>> Today, three of our vSphere VMs running on iSCSI LIO 4.9.30 failed to create a backup snapshot and
>> hung with errors like "Create virtual machine snapshot xxxxx. Unable to close the 
>> '/vmfs/volumes/.../xxxxx-000001-ctk.vmdk' file: 5 (Input/output error)." or other more general I/O
>> errors. It always happened during snapshot creation and there were multiple "Detected MISCOMPARE +
>> Target/iblock: Send MISCOMPARE check condition and sense" in target log at the same time.
>> Subsequently, virtual machines lost access to their virtual disks and required VM reset. The
>> failures seem to be independent of each other and VMs ran on different hosts.
>>
> So nothing else in the target logs of interest..?
>
> I assume the MISCOMPARE warnings occur at the normal rate..?

Yes, no other errors or anything suspicious in target logs.


>> The storage was upgraded to 4.9.30 only two days ago. However, we have an identical iSCSI LIO
>> storage running 4.9.27 more than three weeks without any issue in the same vSphere cluster. So I'm
>> wondering if this could be caused by a stable target patch between 4.9.27 and 4.9.30. Quick look
>> into changelog shows "target: Fix compare_and_write_callback handling for non GOOD status" as the
>> only fix related to CAW since 4.9.27. What do you think?
>>
>> We have ESXi 5.5.0 rev. 5230635 on all ESXi nodes.
> Note the 'target: Fix compare_and_write_callback handling for non GOOD
> status' change only effects COMPARE_AND_WRITE related I/Os that actually
> fail.
>
> That is, unless the underlying backend target device was actually
> generating hard I/O errors (eg: something like the following where 'sdc'
> is your target backend device):
>
>    Buffer I/O error on dev sdc, logical block 0, async page read
>    blk_update_request: I/O error, dev sdc, sector 2097144
>    blk_update_request: I/O error, dev sdc, sector 2097144
>    Buffer I/O error on dev sdc, logical block 262143, async page read
>    blk_update_request: I/O error, dev sdc, sector 0
>    Buffer I/O error on dev sdc, logical block 0, async page read
>    blk_update_request: I/O error, dev sdc, sector 0
>
> then the CAW change above in v4.9.30 won't have any effect.
>
> If the issue is reproducible, you can verify by re-enabling the debug
> message for a hard I/O error in compare_and_write_callback():
>
> diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
> index ca42fba..a0de5ab 100644
> --- a/drivers/target/target_core_sbc.c
> +++ b/drivers/target/target_core_sbc.c
> @@ -479,7 +479,7 @@ static sense_reason_t compare_and_write_callback(struct se_cmd *cmd, bool succes
>          * been failed with a non-zero SCSI status.
>          */
>         if (cmd->scsi_status) {
> -               pr_debug("compare_and_write_callback: non zero scsi_status:"
> +               printk_ratelimited("compare_and_write_callback: non zero scsi_status:"
>                         " 0x%02x\n", cmd->scsi_status);
>                 *post_ret = 1;
>                 if (cmd->scsi_status == SAM_STAT_CHECK_CONDITION)
>
> That said, if you can confirm the backend device is not generating hard
> I/O errors for COMPARE_AND_WRITE I/O up to target-core, I'd wager the
> ESX host failures observed aren't specific to the change.
>
Well, the issue isn't reproducible and no I/O block device errors were generated on target side.
There's also nothing interesting in ESXi logs and I've never seen this error since we started using
vSphere in 2011. That said, it looks as one of the transient issues we usually address to solar
flares :-) Sorry for the noise.

Martin



--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux