Christoph, > So what are useful APIs we can/should expose?. > > If we want full portability we can't support all the individual > checks, because the disk will check it for SCSI even if we don't do > the extra checks in the controller. We could still expose the invidual > flags, but reuse the combinations SCSI doesn't support on SCSI, > although that would lead to surprises if people write their software > and test on NVMe and then move to SCSI. Could we just expose the valid > SCSI combinations if people are find with that for now? I didn't have any actual use for check-this-but-not-that. The rationale behind having explicit checking flags was my dislike for the fact that the policy decision about what to check was residing inside the disk drive and depended on how it was formatted, which flags were wired up in the EI VPD, etc. I preferred an approach where the OS tells the hardware exactly what to do. There are a couple of free bits in *PROTECT so we could conceivably work with T10 to add the missing pieces. But it would have a pretty long turnaround, of course, and wouldn't address existing devices. Also, things are not entirely symmetric wrt. *PROTECT for reads and writes either. I'll try to wrap my head around it tomorrow. For the user API I think it would be most sensible to have CHECK_GUARD, CHECK_APP, CHECK_REF to cover the common DIX/NVMe case. And then we could have NO_CHECK_DISK and IP_CHECKSUM_CONVERSION to handle the peculiar SCSI corner cases and document that these are experimental flags to be used for test purposes only. Not particularly elegant but I don't have a better idea. Especially since things are inherently asymmetric with controller-to-target communication being protected even if you don't attach PI to the bio. I.e. I think the CHECK_{GUARD,APP,REF} flags should describe how a DIX or NVMe controller should check the attached bip payload. And nothing else. The controller-to-target PI handling is orthogonal and refers to what happens in the second protection envelope, i.e. the communication between a DIX controller and a target. This may or may not be the same PI as in the bip payload. Therefore I think these flags should be separate. I'll mull over it a bit more and revisit all the SCSI wrinkles. > I'm not currently seeing warnings on SCSI, but that's because my only > PI testing is scsi_debug which starts out with deallocated blocks. SCSI says that deallocated blocks have 0xFFFF in the app tag and thus checking should be disabled on read. And if you subsequently write a block without providing PI, the drive generates a valid guard and ref tag (for Type 1). So there should never be a situation where reading a block returns a PI error unless the block is corrupted. Either the app tag escape is present or the PI is valid. SCSI subsequently added some blurriness to permit deviations from this principle. But the original PI design explicitly ensured that PI was never accidentally invalid and reads would never fail. Even if you wrote the drive on a system that didn't know about PI things would be OK. This was deliberately done so reading partition tables, etc. wouldn't fail. In Linux we currently treat Type 2 as Type 1 for pretty much the same reason: To ensure that the ref tag is always well-defined. I.e. it contains the lower 32 bits of the LBA. The intent when we defined E2EDP in NVMe was to match this never-fail SCSI behavior. So I'm puzzled as to why you see errors. I'll try to connect my NVMe test box tomorrow. It's been offline after a rack move. Would like to understand what's going on. Are we not setting ILBRT/EILBRT appropriately? -- Martin K. Petersen Oracle Linux Engineering