Re: [PATCH 0/6] block: add support for REQ_OP_VERIFY

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Wed, 13 Jul 2022 13:17:56 +0100

On Wed, Jul 13, 2022 at 09:14:42AM +0000, Chaitanya Kulkarni wrote:
> On 7/6/22 10:42, Matthew Wilcox wrote:
> > On Thu, Jun 30, 2022 at 02:14:00AM -0700, Chaitanya Kulkarni wrote:
> >> This adds support for the REQ_OP_VERIFY. In this version we add
> > 
> > IMO, VERIFY is a useless command.  The history of storage is full of
> > devices which simply lie.  Since there's no way for the host to check if
> > the device did any work, cheap devices may simply implement it as a NOOP.
> 
> Thanks for sharing your feedback regarding cheap devices.
> 
> This falls outside of the scope of the work, as scope of this work is
> not to analyze different vendor implementations of the verify command.

The work is pointless.  As a customer, I can't ever use the VERIFY
command because I have no reason for trusting the outcome.  And there's
no way for a vendor to convince me that I should trust the result.

> > Even expensive devices where there's an ironclad legal contract between
> > the vendor and customer may have bugs that result in only some of the
> > bytes being VERIFYed.  We shouldn't support it.
> This is not true with enterprise SSDs, I've been involved with product
> qualification of the high end enterprise SSDs since 2012 including good
> old non-nvme devices with e.g. skd driver on linux/windows/vmware.

Oh, I'm sure there's good faith at the high end.  But bugs happen in
firmware, and everybody knows it.

> > Now, everything you say about its value (not consuming bus bandwidth)
> > is true, but the device should provide the host with proof-of-work.
> 
> Yes that seems to be missing but it is not a blocker in this work since
> protocol needs to provide this information.

There's no point in providing access to a feature when that feature is
not useful.

> We can update the respective specification to add a log page which
> shows proof of work for verify command e.g.
> A log page consist of the information such as :-
> 
> 1. How many LBAs were verified ? How long it took.
> 2. What kind of errors were detected ?
> 3. How many blocks were moved to safe location ?
> 4. How much data (LBAs) been moved successfully ?
> 5. How much data we lost permanently with uncorrectible errors?
> 6. What is the impact on the overall size of the storage, in
>     case of flash reduction in the over provisioning due to
>     uncorrectible errors.

That's not proof of work.  That's claim of work.

> > I'd suggest calculating some kind of checksum, even something like a
> > SHA-1 of the contents would be worth having.  It doesn't need to be
> > crypto-secure; just something the host can verify the device didn't spoof.
> 
> I did not understand exactly what you mean here.

The firmware needs to prove to me that it *did something*.  That it
actually read those bytes that it claims to have verified.  The simplest
way to do so is to calculate a hash over the blocks which were read
(maybe the host needs to provide a nonce as part of the VERIFY command
so the drive can't "remember" the checksum).