On 13-04-23 03:41 PM, Ric Wheeler wrote:
For many years, we have used WCE as an indication that a device has a volatile write cache (not just a write cache) and used this as a trigger to send down SYNCHRONIZE_CACHE commands as needed. Some arrays with non-volatile cache seem to have WCE set and simply ignore the command. Some arrays with non-volatile cache seem to not set WCE. Others arrays with non-volatile cache - our problem arrays - set WCE and do something horrible and slow when sent the SYNCHRONIZE_CACHE commands. Note that for file systems, you can override this behavior by mounting with our barriers disabled (mount -o nobarrier .....). There is currently no way do disable this for anything using the device directly, not through the file system. Some applications run against block devices - not through a file system - and want not to slow to a crawl when they have an array in my problem set. Giving them a hook to ignore WCE seems to be a hack, but one that would resolve issues with users who won't want to wait months (years?) for us to convince the array vendors. Is this a hook worth doing? Have we hashed this out in the T10 committee?
Naturally I'm biased, but I tend to think the user space is usually smarter than the kernel. That assumes skilled users. So if the user space issues a SYNCHRONIZE_CACHE with the IMMED bit set and for the whole disk then the user should have a way of forcing that command to be issued. The assumption here is that the skilled user is about to power down that array or pull some disks or SSDs *. The more questionable cases are when a file system or the block layer is issuing a barrier or some such that translates to a SYNCHRONIZE_CACHE. That should be ignored in some cases already discussed in this thread. While working with SoCs I have noticed an interesting technique. Sub-system sized sections of the memory mapped IO space (e.g. a bank of GPIOs) can be write protected by a simple ASCII sequence **. Attempts to change configuration registers after write protect are ignored and an error is noted (if anyone cares). The same ACSII sequence can be used to un-write protect those sub-system configuration registers. Typically on a SoC if the GPIOs are randomly re-configured, it's game over. Back to the SCSI world: a better solution might be if an LLD could be informed of the reason a SCSI control command is being issued (a sort of "come from" field). Failing, or it addition to that, a sysfs interface could be added to filter out "dangerous" SCSI commands: echo "SC" > /sys/class/scsi_device/8:0:0:0/device/filter cat /sys/class/scsi_device/8:0:0:0/device/filter FU SC If, for whatever reason, we did ignore a SYNCHRONIZE_CACHE command we could use vendor specific sense data (vendor=Linux) to indicate that a command had been ignored. That could be extended to all SCSI commands that are filtered out ***; better that than EIO, EACCES etc. Doug Gilbert * and if Linux doesn't permit this, then user might be advised to run another, more obedient, host OS with Linux running as a VM. A "pass-by" rather than a "pass-through" ... ** only the configuration registers are write protected, so data can still be written to the GPIOs *** like me, many pass-through users cannot see why SCSI commands injected to the SCSI subsystem (e.g. via sg or bsg) are filtered out silently by the block layer. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html