Re: SSD - TRIM command

Roberto Spadim <roberto@xxxxxxxxxxxxx> · Wed, 9 Feb 2011 17:22:19 -0200



i agree with ppps
that´s why ecc, checksum and parity is usefull (raid5,6) (raid1 if you
read from all mirror to check difference and select the 'right disk')

2011/2/9 Doug Dumitru <doug@xxxxxxxxxx>:
> I work with SSDs arrays all the time, so I have a couple of thoughts
> about trim and md.
>
> 'trim' is still necessary.  SandForce controllers are "better" at
> this, but still need free space to do their work.  I had a set of SF
> drives drop to 22 MB/sec writes because they were full and scrambled.
> It takes a lot of effort to get them that messed up, but it can still
> happen.  Trim brings them back.
>
> The bottom line is that SSDs do block re-organization on the fly and
> free space makes the re-org more efficient.  More efficient means
> faster, and as importantly less wear amplification.
>
> Most SSDs (and I think the latest trim spec) are deterministic on
> trim'd sectors.  If you trim a sector, they read that sector as zeros.
>  This makes raid much "safer".
>
> raid/0,1,10 should be fine to echo discard commands down to the
> downstream drives in the bio request.  It is then up to the physical
> device driver to turn the discard bio request into an ATA (or SCSI)
> trim.  Most block devices don't seem to understand discard requests
> yet, but this will get better over time.
>
> raid/4,5,6 is a lot more complicated.  With raid/4,5 with an even
> number of drives, you can trim whole stripes safely.  Pieces of
> stripes get interesting because you have to treat a trim as a write of
> zeros and re-calc parity.  raid/6 will always have parity issues
> regardless of how many drives there are.  Even worse is that
> raid/4,5,6 parity read/modify/write operations tend to chatter the FTL
> (Flash Translation Layer) logic and make matters worse (often much
> worse).  If you are not streaming long linear writes, raid/4,5,6 in a
> heavy write environment is a probably a very bad idea for most SSDs.
>
> Another issue with trim is how "async" it behaves.  You can trim a lot
> of data to a drive, but it is hard to tell when the drive actually is
> ready afterwards.  Some drives also choke on trim requests that come
> at them too fast or requests that are too long.  The behavior can be
> quite random.  So then comes the issue of how many "user knobs" to
> supply to tune what trims where.  Again, raid/0,1,10 are pretty easy.
> Raid/4,5,6 really requires that you know the precise geometry and
> control the IO.  Way beyond what ext4 understands at this point.
>
> Trim can also be "faked" with some drives.  Again, looking at the
> SandForce based drives, these drive internally de-dupe so you can fake
> write data and help the drives get free space.  Do this by filling the
> drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync,
> and then delete the big.file.  This works through md, across SANs,
> from XEN virtuals, or wherever.  With SandForce drives, this is not as
> effective as a trim, but better than nothing.  Unfortunately, only
> SandForce drives and Flash Supercharger understand zero's this way.  A
> filesystem option that "zeros discarded sectors" would actually make
> as much sense in some deployment settings as the discard option (not
> sure, but ext# might already have this).  NTFS has actually supported
> this since XP as a security enhancement.
>
> Doug Dumitru
> EasyCo LLC
>
> ps:  My background with this has been the development of Flash
> SuperCharger.  I am not trying to run an advert here, but the care and
> feeding of SSDs can be interesting.  Flash SuperCharger breaks most of
> these rules, but it does know the exact geometry of what it is driving
> and plays excessive games to drives SSDs at their exact "sweet spot".
> One of our licensees just sent me some benchmarks at > 500,000 4K
> random writes/sec for a moderate sized array running raid/5.
>
> pps:  Failures of SSDs are different than HDDs.  SSDs can and do fail
> and need raid for many applications.  If you need high write IOPS, it
> pretty much has to be raid/1,10 (unless you run our Flash SuperCharger
> layer).
>
> ppps:  I have seen SSDs silently return corrupted data.  Disks do this
> as well.  A paper from 2 years ago quoted disk silent error rates as
> high as 1 bad block every 73TB read.  Very scary stuff, but probably
> beyond the scope of what md can address.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html