RE: T10 WCE interpretation in Linux & device level access

"Black, David" <david.black@xxxxxxx> · Wed, 24 Apr 2013 14:20:08 -0400

Jeremy,

It looks like, you, Paolo and Ric have hit the nail on the head here - this is
a nice summary, IMHO:

> On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
>>> If the device can promise this, we don't care (and don't know) how it 
>>> manages that promise. It can leave the data on battery backed DRAM, can 
>> archive it to flash or any other scheme that works.
>> 
>> That's exactly the point of SYNC_NV=1.
>
>	Well its the point, but the specification is written such that the vendors can
> choose to implement it any way they wish, especially for split cache
> systems where there is both volatile and non volatile cache.

Independent of T10's best intentions at the time, the implementations aren't
doing what's needed or intended, and I'd guess that the SYNC_NV bit is not
being set to 1 by [other people's ;-) ] software that should be setting it
to 1 if it were paying attention to the standard.

This is further complicated by it being completely legitimate wrt the SCSI
standard to put non-volatile cache in a system and not have the SCSI interface
admit that the non-volatile cache exists (WCE=0, SYNCHRONIZE CACHE is a no-op
independent of the value of SYNC_NV).

I believe that Rob Elliot's 13-050 proposal to obsolete SYNC_NV and re-specify
SYNCHRONIZE CACHE to make all data non-volatile by whatever means the target
chooses is what T10 should do, and that matches Ric's summary:

>>> If the device can promise this, we don't care (and don't know) how it 
>>> manages that promise. It can leave the data on battery backed DRAM, can 
>> archive it to flash or any other scheme that works.

Beyond that, attempting to manage drive removal from storage systems via the
SCSI interface with standard commands is a waste of time and effort, IMHO.
In a serious storage array (and even some fairly simple RAID controllers), some
vendor-specific "magic" is needed to get the array (or controller) to prepare
so that the drive can be removed cleanly.  To oversimplify, it's not enough to
flush data to the drive; the array or controller is stateful, and hence has
to be told to "forget" the drive, where "forget" involves things that are
rather implementation-specific.

Thanks,
--David
----------------------------------------------------
David L. Black, Distinguished Engineer
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
david.black@xxxxxxx        Mobile: +1 (978) 394-7754
----------------------------------------------------

> -----Original Message-----
> From: Jeremy Linton [mailto:jlinton@xxxxxxxxxxxxx]
> Sent: Wednesday, April 24, 2013 10:36 AM
> To: Paolo Bonzini
> Cc: Ric Wheeler; Hannes Reinecke; James Bottomley; linux-scsi@xxxxxxxxxxxxxxx;
> Martin K. Petersen; Jeff Moyer; Tejun Heo; Mike Snitzer; Black, David;
> Elliott, Robert (Server Storage); Knight, Frederick
> Subject: Re: T10 WCE interpretation in Linux & device level access
> 
> On 4/24/2013 7:57 AM, Paolo Bonzini wrote:
> >> If the device can promise this, we don't care (and don't know) how it
> >> manages that promise. It can leave the data on battery backed DRAM, can
> >> archive it to flash or any other scheme that works.
> >
> > That's exactly the point of SYNC_NV=1.
> 
> 	Well its the point, but the specification is written such that the
> vendors can
> choose to implement it any way they wish, especially for split cache
> systems where there is both volatile and non volatile cache.
> 
> 	Flushing the NV cache to medium (as is the current behavior) may not be
> a bad
> idea anyway.
> 
> 	Thats because I know of a large vendors array where the non-volatile
> cache
> might be better described as the "sometimes" non-volatile cache. That is
> because
> a failure to flush the volatile portions results in the non-volatile portions
> being considered invalid when power is restored. This fences the volume, and
> the
> usual method for recovering the array is to call support and have them
> invalidate the NV portions of the cache. Thereby negating the whole reason for
> having a NV cache. I'm sure they don't tell customers this fact when they sell
> the array, when it happened in our lab I was in a state of shock for about a
> week.
> 
> 
> 
> 
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html