Re: SSD data reliable vs. unreliable [Was: Re: Data Recovery from SSDs - Impact of trim?]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2009-01-27 at 16:16 +1100, Neil Brown wrote:
> On Monday January 26, James.Bottomley@xxxxxxxxxxxxxxxxxxxxx wrote:
> > On Mon, 2009-01-26 at 12:34 -0500, Greg Freemyer wrote:
> > > Adding mdraid list:
> > > 
> > > Top post as a recap for mdraid list (redundantly at end of email if
> > > anyone wants to respond to any of this).:
> > > 
> > > == Start RECAP
> > > With proposed spec changes for both T10 and T13 a new "unmap" or
> > > "trim" command is proposed respectively.  The linux kernel is
> > > implementing this as a sector discard and will be called by various
> > > file systems as they delete data files.  Ext4 will be one of the first
> > > to support this. (At least via out of kernel patches.)
> > > 
> > > SCSI - see http://www.t10.org/cgi-bin/ac.pl?t=d&f=08-356r5.pdf
> > > ATA - see T13/e08137r2 draft
> > > 
> > > Per the proposed spec changes, the underlying SSD device can
> > > optionally modify the unmapped data.  SCSI T10 at least restricts the
> > > way the modification happens, but data modification of unmapped data
> > > is still definitely allowed for both classes of SSD.
> > > 
> > > Thus if a filesystem "discards" a sector, the contents of the sector
> > > can change and thus parity values are no longer meaningful for the
> > > stripe.
> > 
> > This isn't correct.  The implementation is via bio and request discard
> > flags.  linux raid as a bio->bio mapping entity can choose to drop or
> > implement the discard flag (by default it will be dropped unless the
> > raid layer is modified).
> 
> That's good.  I would be worried if they could slip through without
> md/raid noticing.
> 
> > 
> > > ie. If the unmap-ed blocks don't exactly correlate with the Raid-5 / 6
> > > stripping, then the integrity of a stripe containing both mapped and
> > > unmapped data is lost.
> > > 
> > > Thus it seems that either the filesystem will have to understand the
> > > raid 5 / 6 stripping / chunking setup and ensure it never issues a
> > > discard command unless an entire stripe is being discarded.  Or that
> > > the raid implementation must must snoop the discard commands and take
> > > appropriate actions.
> > 
> > No.  It only works if the discard is supported all the way through the
> > stack to the controller and device ... any point in the stack can drop
> > the discard.  It's also theoretically possible that any layer could
> > accumulate them as well (i.e. up to stripe size for raid).
> 
> Accumulating them in the raid level would probably be awkward.
> 
> It was my understanding that filesystems would (try to) send the
> largest possible 'discard' covering any surrounding blocks that had
> already been discarded.  Then e.g. raid5 could just round down any
> discard request to an aligned number of complete stripes and just
> discard those.  i.e. have all the accumulation done in the filesystem.

The jury is still out on this one.   Array manufacturers, who would
probably like this as well because their internal granularity for thin
provisioning is reputedly huge (in the megabytes).  However, trim and
discard are being driven by SSD which has no such need.

> To be able to safely discard stripes, raid5 would need to remember
> which stripes were discarded so that it could be sure to write out the
> whole stripe when updating any block on it, thus ensuring that parity
> will be correct again and will remain correct.

right.  This gives you a minimal discard size of the stripe width.

> Probably the only practical data structure for this would be a bitmap
> similar to the current write-intent bitmap.

Hmm ... the feature you're talking about is called white space
elimination by most in the industry.  The layer above RAID (usually fs)
knows this information exactly ... if there were a way to pass it on,
there'd be no need to store it separately.

> Is it really worth supporting this in raid5?   Are the sorts of
> devices that will benefit from 'discard' requests likely to be used
> inside an md/raid5 array I wonder....

There's no hard data on how useful Trim will be in general.  The idea is
it allows SSDs to pre-erase (which can be a big deal) and for Thin
Provisioning it allows just in time storage decisions.  However, all
thin provision devices are likely to do RAID internally ...

> raid1 and raid10 are much easier to handle, so supporting 'discard'
> there certainly makes sense.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux