Re: Software RAID and TRIM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/07/2011 23:52, Lutz Vieweg wrote:
David Brown wrote:
However, AFAIUI, you are wrong about TRIM being essential for the
continued high performance of SSDs. As long as your SSDs have some
over-provisioning (or you only partition something like 90% of the
drive), and it's got good garbage collection, then TRIM will have
minimal effect.

I beg to differ.


Well, I don't have your experience here (I have a couple of 60G SSD's in RAID0, without TRIM, but that's hardly in the same class). So I don't expect you to put much weight on my opinions. But maybe it will give you reason for more testing.

We are using SSDs in very much the way that Tom de Mulder intends,
and from our extensive performance measurements over many months
now I can say that (at least if you do have significant amounts
of write operations) it _does_ make a lot of difference whether you
periodically discard the unused sectors or not.
(For us, the write performance measured to be about half as good
when there are no free erase blocks available anymore.)


If there are no free erase blocks, then your SSD's don't have enough over-provisioning. This is, after all, the whole point of having more physical flash than the logical disk size would suggest. Depending on the quality of the SSD (more expensive ones have more over-provisioning), and the usage patterns (if you have lots of small random writes, you'll need more extra space), then you might have to "manually" over-provision the disk by only partitioning about 90% of the disk. Of course, you must make sure that the remaining 10% is "discarded", or left untouched from new, and that you use the partition for your RAID and not the whole disk.

So now you have plenty of erase blocks at any time, and your write performance will be good.


TRIM, on the other hand, does not give you any extra free erase blocks. If you think it does, you've misunderstood it.

TRIM exists to make garbage collection a little more efficient - when garbage collecting an erase block that contains TRIM'ed blocks, the TRIM'ed blocks don't need to be copied. This saves a small amount of time in the copying, and allows slightly denser packing. It may sometimes lead to saving whole erase blocks, but that's seldom the case in practice except when erasing large files.

If your disks are reasonably full, then TRIM will not help much because the garbage collection will be desperately trying to piece together small bits into complete erase blocks, and your performance will drop through the floor. If you have plenty of overprovisioning, then the SSD still has lots of completely free erase blocks whenever it needs them.

If your filesystem re-uses (logical) blocks, then TRIM will not help. It is /always/ more efficient for the FS to simply write new data to the same block, rather than TRIM'ing it first.

TRIM is a very expensive command - it acts a bit like a write, but it is not a queued command. Thus the block layer must wait for /all/ IO commands to have completed, then issue the TRIM, then wait for it to complete, and then carry on with new commands. On some SSD's, it will (according to something I read) trigger garbage collection, which may slow down the SSD. Even without that, the performance of most meta-data operations (such as delete) will drop considerably when they also need to do TRIM.

<http://people.redhat.com/jmoyer/discard/ext4_batched_discard/ext4_discard.html>

<http://lwn.net/Articles/347511/>

<http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=116034&threadid=115697&roomid=2>


On the other hand, your off-line batch TRIM during low use periods could well be a win. The cost of these discards is not going to be an issue, and large batched discards are going to be far more useful to the SSD than small scattered ones. I believe that there has been work on a similar system in XFS - I don't know what happened to that, or if there is any way to make it work in concert with md raid.


What will make a big difference to using SSD's in md raid is the sync/no-sync tracking. This will avoid a lot of unnecessary writes, especially with a new array, and leave the SSD with more free blocks (at least until the disk is getting full of data). It is also much higher up the things-to-do list, because it will be useful for all uses of md raid, and is a perquisite to general discard support. (Strictly speaking it is not needed for SSD's that guarantee a zero return on TRIM'ed blocks - but only some SSD's give that guarantee.)


Of course, you can only benefit from discards if your filesystem
is not full (because then there is nothing to discard). But any
kind of "garbage collection" by the SSD itself will not have the
same effect, since it cannot know which blocks are in use by the
filesystem.


Garbage collection will recycle blocks that have been overwritten. The filesystem knows which logical blocks are in use, and which are free. Filesystems already heavily re-use blocks, in the aim of preferring faster outer tracks on HD's, and minimizing head movement. So when a file is erased, there's a good chance that those same logical blocks will be re-used soon - TRIM is of no benefit in that case.

I think other SSD-optimisations, such as those in BTRFS, are much more
important.

Actually, (apart from btrfs still being in development, not really
ready for production use, yet), XFS (-o delaylog,barrier) performs
better on our SSDs than btrfs - without any SSD-specific options.


btrfs is ready for some uses, but is not mature and real-world tested enough for serious systems (and its tools are still lacking somewhat). But more generally, different filesystems are faster and slower for different usage patterns.

One SSD optimisation that many filesystems could implement is to be less concerned about fragmentation. Most modern filesystems go out of their way to try to reduce fragmentation, which is great for HD use. But on SSD's, you should be happy to fragment files if it promotes re-use of erased blocks, as long as fragments aim to fill complete erase blocks (in size and alignment).


What is really an important factor for SSD performance: The controller.
The same SSDs perform with significantly lower latency for us when
connected to SATA controller channels than when connected to SAS
controllers (and they perform abysmal when used as hardware-RAID
constituents, in comparison).

That is /very/ interesting to know, and is a data point I haven't read elsewhere (though I knew about poor performance of hardware RAID with SSD). Thanks for sharing that.



Regards,

Lutz Vieweg

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux