RE: [scsi] write same with NDOB (no data-out buffer) support

Bob Sheffield <bob.sheffield@xxxxxxxxxxxx> · Wed, 27 Feb 2019 13:02:39 -0700

Hi Martin,

Good to hear from you again.

There are at least two problems supporting WRITE SAME (unmap=1, ndob=0)
issued to a RAID controller.

First, we don't know what pattern the host will generate. And the basic
premise of ndob=0 suggests that the pattern provided by the host may not be
all zeros. But the RAID controller maps host data to multiple drives, and
effective unmapping of blocks on actual media requires consistency across
the attached drives. The only reasonable assumption/requirement is that the
drives all implement an all-zeros provisioning initialization pattern. This,
indeed, requires the drive to actually compare the WRITE SAME data with the
drive's provisioning initialization pattern, and this might be a minor
overhead for the drive (though that's debatable given IOPS and latency specs
for today's fastest SSDs). But in implementing a RAID controller, the RAID
controller has to process the aggregate IOPS of all the attached SSDs. And
to determine how to handle each WRITE SAME issued by the host, the RAID
controller also has to determine if the pattern is as expected (all zeros)
in order to maintain parity-RAID stripe coherency, and even image
consistency for RAID-1. The burden of performing a memory compare on data
from the host data-out buffer in a RAID controller becomes unmanageable from
a performance standpoint. For this reason, it is of great benefit to avoid
having to perform a memory compare for each WRITE SAME (unmap=1) I/O, but
rather rely on the implication that the data-out buffer (if supplied) should
contain all zeros as implied by ndob=1 and lbprz=001b.

Second, the RAID controller has to maintain consistent parity across RAID
parity arrays. The RAID controller has to have a deterministic way to make
sure the data returned on a subsequent read is consistent with the
corresponding parity generated. If it is known the resulting data is all
zeros, that becomes trivial (new parity = old parity XOR old data), and
there is no need to read-back after the WS command to account for the new
data pattern. Allowing WS(unmap=1, ndob=0) complicates that because the data
may not be all zeros. This gets more confounding when the attached drives
are not SAS, but are SATA or NVMe which do not define commands equivalent to
WRITE SAME (unmap=1, ndob=0) with a non-zero data-out buffer.

Bottom line is that SCSI defined WRITE SAME (unmap=1, ndob=1) as it did
(with the requirement that support for unmap=1 implies support for ndob=1)
for good reason. And the SATA and NVMe equivalents were designed to be
compatible for the same reasons. It would be really nice if we could get the
ecosystem in general on-board with this model.

Thanks,
Bob Sheffield
RAID Architect: Broadcom Inc.

-----Original Message-----
From: Kashyap Desai [mailto:kashyap.desai@xxxxxxxxxxxx]
Sent: Wednesday, February 27, 2019 10:15 AM
To: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>; linux-scsi <linux-scsi@xxxxxxxxxxxxxxx>;
Bob Sheffield <bob.sheffield@xxxxxxxxxxxx>
Subject: RE: [scsi] write same with NDOB (no data-out buffer) support

Adding Bob Sheffield from Broadcom.

>
> Hi Kashyap,
>
> > I was going through below discussion as well as going through linux
> > scsi code to know if linux scsi stack support NDOB.
>
> Last time NDOB came up there were absolutely no benefits to it from
> the kernel perspective. These days we can save the buffer memory
> allocation
so
> there may be a small win. I do have a patch we can revive.

We can test if you have any patch.

>
> However, I am not aware of any devices that actually support NDOB.
> Plus
it's
> hard to detect since we need to resort to RSOC masks. And blindly
sending
> RSOC is risky business. That's why my patch never went anywhere. It
> was
a lot
> of heuristics churn to set a single bit flag.
>
> Since the benefits are modest (basically saves a memory compare on the
> device), what is the reason you are looking at this?

SCSI SBC-4 requires that any drive that supports WRITE SAME (unmap=1) also
support ndob=1. So a drive supports it if it reports LBWS=1 in the Block
Provisioning VPD page. If not, the drive violates the SBC-4 standard, so
it's a drive problem.
ssuing WRITE SAME (unmap=1, ndob=0) only achieves block unmapping if the
pattern in the data-out buffer matches the "provisioning initialization
pattern" implemented by the drive.
AFAIK,  current scsi stack isn't somehow determining what that pattern is
for each drive, so eventually current method of using WRITE SAME (unmap=1,
ndob=0)  likely is ineffective in unmapping blocks on media.

On the other hand, if the drive supports WRITE SAME (unmap=1, ndob=1) - as
required by SBC-4 - then Linux can use it to reliably cause LBAs to be
unmapped on media.

Perhaps this is considered a minor issue for direct attached drives, but in
the RAID world, it's a big enough issue that relying on it is the only way
we can reliably maintain coherency across stripes in redundant data
mappings.

>
> > One more question. What happens if WS w/ UNMAP command is passed to
> > the device without zeroed data out buffer in current scsi stack ?
> > Will it permanently disable WS on that device ?
>
> Depends how the device responds.
>
> --
> Martin K. Petersen	Oracle Linux Engineering