RE: T10 WCE interpretation in Linux & device level access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If the writeback cache is enabled (per the WCE bit in the Caching mode page),
prudent software uses the FUA bit in WRITE commands when writing metadata
and/or sends the SYNCHRONIZE CACHE command at important checkpoints to 
ensure the data is not going to be lost due to a power loss.  Some 
database software is particularly prolific at sending these commands. 

Around 2003, many RAID controllers with non-volatile writeback caches honored
the SYNCHRONIZE CACHE command, flushing the entire cache to the drives.  This
started causing timeouts as non-volatile write cache sizes grew.  Recently,
it's even causing trouble on individual disk drives with growing volatile 
write caches.

The intent of software using these commands and bits was unclear - it could be:
a) ensure data is in non-volatile cache (and will eventually be flushed) 
   or on the medium; or
b) ensure data is on the medium (so the drives are ready for removal). 

As a short-term fix, many RAID controllers assumed intent (a) and started
interpreting the SYNCHRONIZE CACHE command as a NOP and ignoring the FUA bit.  

Surprise removal of a drive from a RAID controller is risky even if software 
has run SYNCHRONIZE CACHE, since the RAID controller might be doing other
activity in the background. So, there are other reasons to justify assuming
that the user just won't do that.

Afraid of breaking software with intent (b) (which was more likely in the 
days of floppy disks, Bournelli Boxes, and other removable block devices), 
T10 chose to clarify that the original meaning was (b) and added new 
FUA_NV and SYNC_NV bits to let software express intent (a).  The hope
was that devices would implement the bits and software would start using
them at appropriate times.

Unfortunately, the short-term fix worked well enough that it still prevails
today, and most standalone removable media block devices have disappeared.
There is not much software actually sending the FUA_NV and SYNC_NV bits 
and few devices honoring the bits per the standard.

As an SBC-3 letter ballot comment, I recently submitted T10 proposal 
13-050 (see http://www.t10.org/doc13.htm) to obsolete the SYNC_NV and 
FUA_NV bits and change the meaning of the commands without those bits
to intent (a), reflecting what the industry has actually done.





-----Original Message-----
From: linux-scsi-owner@xxxxxxxxxxxxxxx [mailto:linux-scsi-owner@xxxxxxxxxxxxxxx] On Behalf Of Jeremy Linton
Sent: Tuesday, April 23, 2013 5:40 PM
To: James Bottomley
Cc: Ric Wheeler; linux-scsi@xxxxxxxxxxxxxxx; Martin K. Petersen; Jeff Moyer; Tejun Heo; Mike Snitzer; dgilbert@xxxxxxxxxxxx
Subject: Re: T10 WCE interpretation in Linux & device level access

On 4/23/2013 3:07 PM, James Bottomley wrote:

> 
> I bet they don't; they probably obey the spec.  There's a SYNC_NV bit
> which if unset (which it is in our implementation) means only sync your
> non-NV cache.  For a device with all NV, that equates to nop.

	Yes, linux leaves the SYNC_NV bit unset in scsi_setup_flush_cmnd().

The draft specs, and a couple others I have laying about says: says the device
shall sync cache to medium for both volatile and non volatile cache data if
SYNC_NV is _unset_.

With it set, the table could be more confusing!

For volatile cache blocks with SYNC_NV set "If a non-volatile cache is present,
then the device server shall synchronize to non-volatile cache or to the medium.
If a non-volatile cache is not present, then the device server shall synchronize
to the medium".

And for Non-volatile cache with it set "No Requirement"


Which to me says, don't expect any particular behavior if you set this bit and
have NV it could flush to medium, flush to NV cache, or do nothing at all. But
it seems pretty clear that with it unset its probably going to get synchronized
to the medium.


If T10 were to do something, maybe they could stop putting bits in the docs that
aren't guaranteed to do anything (fill in rant).

As for linux, seems the state of the spec really doesn't leave any good options
other than provide the user the ability to disable the flush_cmnd() if  the
NV_SUP bit is set. Or maybe a white list (ick!)...







--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux