Re: [PATCH] dm-mirror: do not degrade the mirror on discard error

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Wed, 18 Feb 2015 11:29:18 -0500 (EST)

On Wed, 18 Feb 2015, James Bottomley wrote:

> On Tue, 2015-02-17 at 14:59 -0500, Mikulas Patocka wrote:
> > 
> > On Mon, 16 Feb 2015, James Bottomley wrote:
> > 
> > > I already said this in the first sentence of the last paragraph of my
> > > email.  The point isn't what it does today it's what might happen
> > > tomorrow and the principle of least surprise.  One day, someone might
> > > propagate the error.  When that happens, they'll be surprised to find
> > > every discard failure reported as -ENOTSUPP and it will cost someone
> > > time and effort to investigate and fix.  If you just propagate the error
> > > today, you save all that work in the future.
> > > 
> > > James
> > 
> > The question is if this case is so important that it justifies dm-io 
> > change.
> 
> I'm not sure I follow.  Are you saying no-one would ever want to
> propagate the error?  I think that would be short sighted.

The SATA device may report success on the trim command and may not trim 
any data. I know this is stupid, but the standard allows the device to do 
that and the devices are doing that. See this thread 
http://www.spinics.net/lists/linux-scsi/msg80297.html

Consequently, if some filesystem or other application contains the logic 
"if trim succeeded, do something", it is broken, because the SSD may 
ignore the command and report success.

> > The SSD may ignore discards and report them as sucesfully completed, so no 
> > one should depend on the return code anyway. The error code may be used as 
> > a hint that it is futile to send more discards in the future, but relying 
> > on the return code is already not correct.
> 
> That's not a good way of interpreting the standards.

It doesn't matter how do you interpret the standard. It does matter how do 
SSD vendors interpret it. And they interpret it in such a way that it is 
OK to report success and not trim any data. I know it doesn't make much 
sense to standardize the flag "Return Zero After Trim" and then specify 
that the device (despite having RZAT set) may ignore the trim command and 
not return zeroes on the trimmed data. But it's the way it is.

Mikulas

> For instance unmap has two types of error: permanent and transient.  
> Permanent means the device would never be able to process the unmap and 
> you should move on. Transient means the device may be able to process 
> the unmap and you might like to repeat it.  Mostly the retries will be 
> handled by SCSI but not always.
> 
> That the discard issuer doesn't care is also not a given.  In the low
> end SSD case you cite above, they probably don't.  However, if it's a
> cloud environment charging per megabyte per day for provisioned
> capacity, they probably do care.
> 
> The point here is that since you have the ability to do the right thing
> (you have the error code the lower layer sent), just do it.  It will
> save a lot of pain later on.  Doing the wrong thing and trying to
> justify it post facto based on how you see the future evolving is
> inevitably the wrong course of action because we're not very good at
> predictions.
> 
> James

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel