Re: IO failures with SMR drives at latest kernel versions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/27/2015 12:52 AM, James Bottomley wrote:
> On Wed, 2015-08-26 at 08:40 +0200, Hannes Reinecke wrote:
>> On 08/26/2015 06:53 AM, Anatol Pomozov wrote:
>>> Hi
>>>
>>> On Sun, Aug 23, 2015 at 11:15 PM, Hannes Reinecke <hare@xxxxxxx> wrote:
>>>>> I looked at this commit and it actually adds SMR support to SCSI
>>>>> layer. Reverting ATA_DEV_ZAC means going back to zones-unaware
>>>>> algorithms. It is suboptimal but still much better than IO failures
>>>>> and "BTRFS: lost page write due to I/O error on /dev/sdc" errors I see
>>>>> at my computer.
>>>>>
>>>>> If this SMR support is considered as non-stable, can we at least get a
>>>>> kernel boot (or config) option that disables ZAC?
>>>>>
>>>> Again: Has anybody actually _tested_ that reverting this patch fixes
>>>> this issue?
>>>
>>> Yes I tested it.
>>>
>>> This error happens only under heavy load with a lot of read/writes
>>> (like btrfs rebalance).
>>>
>>> With current Linux-4.1.6 'btrfs balance' fails after ~10 minutes after
>>> start. I reverted ZAC related changes and then ran rebalancing. The
>>> operation finished successfully after 3 hours of running.
>>>
>> Can you be a bit more specific about the 'ZAC related changes'?
>> There have been several patches, and we really would need to know
>> which one was the offending one.
>> Can you try to bisect things here?
> 
> OK, let's stop shooting the messenger here.  There are multiple reports
> of this problem.  The pattern seems to be some type of error causes
> everything to die.
> 
> There looks to be an obvious bug in
> 9162c6579bf90b3f5ddb7e3a6c6fa946c1b4cbeb in that there's no
> ATA_DEV_ZAC_UNSUP class which means that any attempt to disable the
> device pushes it up to ATA_DEV_NONE.  I'm not sure ... don't have time
> to follow the code ... but doesn't this interfere with the speed
> dropping routines which seems to disable then re-enable the device?
> Does adding ATA_DEV_ZAC_UNSUP fix this problem? patch (compile tested
> only) below.
> 
> James
> 
> ---
> 
> diff --git a/drivers/ata/libata-transport.c b/drivers/ata/libata-transport.c
> index d6c37bc..fa83320 100644
> --- a/drivers/ata/libata-transport.c
> +++ b/drivers/ata/libata-transport.c
> @@ -144,6 +144,7 @@ static struct {
>  	{ ATA_DEV_SEMB,			"semb" },
>  	{ ATA_DEV_SEMB_UNSUP,		"semb" },
>  	{ ATA_DEV_ZAC,			"zac" },
> +	{ ATA_DEV_ZAC_UNSUP,		"zac" },
>  	{ ATA_DEV_NONE,			"none" }
>  };
>  ata_bitfield_name_search(class, ata_class_names)
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 36ce37b..49c5b98 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -191,7 +191,8 @@ enum {
>  	ATA_DEV_SEMB		= 7,	/* SEMB */
>  	ATA_DEV_SEMB_UNSUP	= 8,	/* SEMB (unsupported) */
>  	ATA_DEV_ZAC		= 9,	/* ZAC device */
> -	ATA_DEV_NONE		= 10,	/* no device */
> +	ATA_DEV_ZAC_UNSUP	= 10,	/* ZAC (unsupported) */
> +	ATA_DEV_NONE		= 11,	/* no device */
>  
>  	/* struct ata_link flags */
>  	ATA_LFLAG_NO_HRST	= (1 << 1), /* avoid hardreset */
> @@ -1517,7 +1518,8 @@ static inline unsigned int ata_class_enabled(unsigned int class)
>  static inline unsigned int ata_class_disabled(unsigned int class)
>  {
>  	return class == ATA_DEV_ATA_UNSUP || class == ATA_DEV_ATAPI_UNSUP ||
> -		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP;
> +		class == ATA_DEV_PMP_UNSUP || class == ATA_DEV_SEMB_UNSUP ||
> +		class == ATA_DEV_ZAC_UNSUP;
>  }
>  
>  static inline unsigned int ata_class_absent(unsigned int class)
> 
> 
Yes, you are correct. Even if this does not fix up this particular
issue it looks like a valid fix.

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@xxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux