Re: trim in SSD's

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Sun, 6 Sep 2009 13:33:55 -0400

On Sun, Sep 6, 2009 at 7:19 AM, SandeepKsinha<sandeepksinha@xxxxxxxxx> wrote:
> Hi All,
>
> Some of the SSD's implement trim as a "no-op"? I also heard that they
> have better wear life?

Trim in the linux kernel is much debated and is NOT yet implemented
end to end in the kernel with the exception of Mark Lord's batch
implementation.  That is designed more like a defrag tool that could
run every night, etc.

So there are a few options under discussion:

Option 1.  Issue a trim each and every time a block is freed at the fs
level.  Could cause thousands of extra io's per second.    Intel has
apparently said during a workshop that there high-end SSDs will
operate best in this mode.

But the ATA-8 draft spec. does not give indications of trim
performance and for the one trim capable SSD that has been tested with
linux, this behavior drastically slows down the SSDs performance.

A white list has been discussed that would allow only high-performance
SSDs to get treated this way.

ext4 has been patched to issue these calls to the block layer, but the
block layer is not yet passing them on I don't think.

There is a block layer patch to do this, but the discussion is about
if that patch should be accepted.  It does not currently have a white
list feature, so it would might hurt the performance of most SSDs.

Option 2:  In the block or filesystem layer implement code to coalesce
unmap's into bigger ranges, but still pass them down more or less
realtime.  ie. more or less when the delete occurred.  Delayed just
long enough to see if more unmaps in the same range are coming.

XFS has a patch floating around to implement this I believe.  Not in mainline.

Option 3: Write a userspace tool to fallocate all the available
freespace in a filesystem, then tell the SSD it can trim all of these
blocks, then have user space delete the file.  This can then be
scheduled daily, etc.

Mark Lord has a fully functioning version of option 3, and no
additional kernel patches are needed.

I understand Windows 7 follows the path of option 1, but that could
easily be wrong.  One could easily imagine that all the 1000's of
extra trims per seconds caused by option 1 would wear out a SSD faster
than Option 3 as an example.

> Quoting from some expert's comment:
>
> They purposely allocate every logical block at initialization time so
> the drive is
> effectively "aged" from the outset.  This way the performance doesn't
> change over time.  They implement a huge amount of back-end bandwidth to
> handle the resulting write amplification.

As to performance over time, SSD erase methodologies apparently cause
a nearly empty SSD to perform better than a nearly full one.  So if
you fully allocate all the data blocks from the beginning, you won't
get optimum performance, but you will get consistent performance.  To
balance that out, you would need to add enough raw performance and/or
spare erase blocks to provide good performance even when fully
allocated.

> What exactly does this mean?
>
> --
> Regards,
> Sandeep.
>
Greg

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ