Hi Martin, Good to hear from you again. There are at least two problems supporting WRITE SAME (unmap=1, ndob=0) issued to a RAID controller. First, we don't know what pattern the host will generate. And the basic premise of ndob=0 suggests that the pattern provided by the host may not be all zeros. But the RAID controller maps host data to multiple drives, and effective unmapping of blocks on actual media requires consistency across the attached drives. The only reasonable assumption/requirement is that the drives all implement an all-zeros provisioning initialization pattern. This, indeed, requires the drive to actually compare the WRITE SAME data with the drive's provisioning initialization pattern, and this might be a minor overhead for the drive (though that's debatable given IOPS and latency specs for today's fastest SSDs). But in implementing a RAID controller, the RAID controller has to process the aggregate IOPS of all the attached SSDs. And to determine how to handle each WRITE SAME issued by the host, the RAID controller also has to determine if the pattern is as expected (all zeros) in order to maintain parity-RAID stripe coherency, and even image consistency for RAID-1. The burden of performing a memory compare on data from the host data-out buffer in a RAID controller becomes unmanageable from a performance standpoint. For this reason, it is of great benefit to avoid having to perform a memory compare for each WRITE SAME (unmap=1) I/O, but rather rely on the implication that the data-out buffer (if supplied) should contain all zeros as implied by ndob=1 and lbprz=001b. Second, the RAID controller has to maintain consistent parity across RAID parity arrays. The RAID controller has to have a deterministic way to make sure the data returned on a subsequent read is consistent with the corresponding parity generated. If it is known the resulting data is all zeros, that becomes trivial (new parity = old parity XOR old data), and there is no need to read-back after the WS command to account for the new data pattern. Allowing WS(unmap=1, ndob=0) complicates that because the data may not be all zeros. This gets more confounding when the attached drives are not SAS, but are SATA or NVMe which do not define commands equivalent to WRITE SAME (unmap=1, ndob=0) with a non-zero data-out buffer. Bottom line is that SCSI defined WRITE SAME (unmap=1, ndob=1) as it did (with the requirement that support for unmap=1 implies support for ndob=1) for good reason. And the SATA and NVMe equivalents were designed to be compatible for the same reasons. It would be really nice if we could get the ecosystem in general on-board with this model. Thanks, Bob Sheffield RAID Architect: Broadcom Inc. -----Original Message----- From: Kashyap Desai [mailto:kashyap.desai@xxxxxxxxxxxx] Sent: Wednesday, February 27, 2019 10:15 AM To: Martin K. Petersen <martin.petersen@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx>; linux-scsi <linux-scsi@xxxxxxxxxxxxxxx>; Bob Sheffield <bob.sheffield@xxxxxxxxxxxx> Subject: RE: [scsi] write same with NDOB (no data-out buffer) support Adding Bob Sheffield from Broadcom. > > Hi Kashyap, > > > I was going through below discussion as well as going through linux > > scsi code to know if linux scsi stack support NDOB. > > Last time NDOB came up there were absolutely no benefits to it from > the kernel perspective. These days we can save the buffer memory > allocation so > there may be a small win. I do have a patch we can revive. We can test if you have any patch. > > However, I am not aware of any devices that actually support NDOB. > Plus it's > hard to detect since we need to resort to RSOC masks. And blindly sending > RSOC is risky business. That's why my patch never went anywhere. It > was a lot > of heuristics churn to set a single bit flag. > > Since the benefits are modest (basically saves a memory compare on the > device), what is the reason you are looking at this? SCSI SBC-4 requires that any drive that supports WRITE SAME (unmap=1) also support ndob=1. So a drive supports it if it reports LBWS=1 in the Block Provisioning VPD page. If not, the drive violates the SBC-4 standard, so it's a drive problem. ssuing WRITE SAME (unmap=1, ndob=0) only achieves block unmapping if the pattern in the data-out buffer matches the "provisioning initialization pattern" implemented by the drive. AFAIK, current scsi stack isn't somehow determining what that pattern is for each drive, so eventually current method of using WRITE SAME (unmap=1, ndob=0) likely is ineffective in unmapping blocks on media. On the other hand, if the drive supports WRITE SAME (unmap=1, ndob=1) - as required by SBC-4 - then Linux can use it to reliably cause LBAs to be unmapped on media. Perhaps this is considered a minor issue for direct attached drives, but in the RAID world, it's a big enough issue that relying on it is the only way we can reliably maintain coherency across stripes in redundant data mappings. > > > One more question. What happens if WS w/ UNMAP command is passed to > > the device without zeroed data out buffer in current scsi stack ? > > Will it permanently disable WS on that device ? > > Depends how the device responds. > > -- > Martin K. Petersen Oracle Linux Engineering