Re: SSD - TRIM command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I work with SSDs arrays all the time, so I have a couple of thoughts
about trim and md.

'trim' is still necessary.  SandForce controllers are "better" at
this, but still need free space to do their work.  I had a set of SF
drives drop to 22 MB/sec writes because they were full and scrambled.
It takes a lot of effort to get them that messed up, but it can still
happen.  Trim brings them back.

The bottom line is that SSDs do block re-organization on the fly and
free space makes the re-org more efficient.  More efficient means
faster, and as importantly less wear amplification.

Most SSDs (and I think the latest trim spec) are deterministic on
trim'd sectors.  If you trim a sector, they read that sector as zeros.
 This makes raid much "safer".

raid/0,1,10 should be fine to echo discard commands down to the
downstream drives in the bio request.  It is then up to the physical
device driver to turn the discard bio request into an ATA (or SCSI)
trim.  Most block devices don't seem to understand discard requests
yet, but this will get better over time.

raid/4,5,6 is a lot more complicated.  With raid/4,5 with an even
number of drives, you can trim whole stripes safely.  Pieces of
stripes get interesting because you have to treat a trim as a write of
zeros and re-calc parity.  raid/6 will always have parity issues
regardless of how many drives there are.  Even worse is that
raid/4,5,6 parity read/modify/write operations tend to chatter the FTL
(Flash Translation Layer) logic and make matters worse (often much
worse).  If you are not streaming long linear writes, raid/4,5,6 in a
heavy write environment is a probably a very bad idea for most SSDs.

Another issue with trim is how "async" it behaves.  You can trim a lot
of data to a drive, but it is hard to tell when the drive actually is
ready afterwards.  Some drives also choke on trim requests that come
at them too fast or requests that are too long.  The behavior can be
quite random.  So then comes the issue of how many "user knobs" to
supply to tune what trims where.  Again, raid/0,1,10 are pretty easy.
Raid/4,5,6 really requires that you know the precise geometry and
control the IO.  Way beyond what ext4 understands at this point.

Trim can also be "faked" with some drives.  Again, looking at the
SandForce based drives, these drive internally de-dupe so you can fake
write data and help the drives get free space.  Do this by filling the
drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync,
and then delete the big.file.  This works through md, across SANs,
from XEN virtuals, or wherever.  With SandForce drives, this is not as
effective as a trim, but better than nothing.  Unfortunately, only
SandForce drives and Flash Supercharger understand zero's this way.  A
filesystem option that "zeros discarded sectors" would actually make
as much sense in some deployment settings as the discard option (not
sure, but ext# might already have this).  NTFS has actually supported
this since XP as a security enhancement.

Doug Dumitru
EasyCo LLC

ps:  My background with this has been the development of Flash
SuperCharger.  I am not trying to run an advert here, but the care and
feeding of SSDs can be interesting.  Flash SuperCharger breaks most of
these rules, but it does know the exact geometry of what it is driving
and plays excessive games to drives SSDs at their exact "sweet spot".
One of our licensees just sent me some benchmarks at > 500,000 4K
random writes/sec for a moderate sized array running raid/5.

pps:  Failures of SSDs are different than HDDs.  SSDs can and do fail
and need raid for many applications.  If you need high write IOPS, it
pretty much has to be raid/1,10 (unless you run our Flash SuperCharger
layer).

ppps:  I have seen SSDs silently return corrupted data.  Disks do this
as well.  A paper from 2 years ago quoted disk silent error rates as
high as 1 bad block every 73TB read.  Very scary stuff, but probably
beyond the scope of what md can address.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux