Re: RAID performance

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 7 Feb 2013 15:03:48 -0700

On Feb 7, 2013, at 6:03 AM, Phil Turmel <philip@xxxxxxxxxx> wrote:

> 
> Trim is a per-sector property.  Once trimmed, it stays trimmed until you
> write to *that* sector.

I agree, although I think technically it is an attribute of the page, since that's the smallest structure on an SSD for read/write. The page sizes appear to be either 4KB or 8KB, while the SSD basically lies to the OS and says its physical sector size is 512 bytes.

Once all pages in an SSD erase block are flagged for erase, which could be done with a SATA TRIM command or (dynamic or static) wear leveling by the SSD itself, then those pages can be erased.

I think what the SSD vendors  have done is, since all requests by an OS are in 4KB blocks, whether or not those LBA's are 4K aligned (like is required for 512e AF disks), they can stuff any 4KB fs block into a 4KB or 8KB page, aligned.

> If you want to achieve something with trim, you would
> trim on your clients and hope it passes all the way down the stack
> through iSCSI to your LVs and then to MD and the SSDs.

TRIM probably does reduce the need for the firmware to do its own static wear leveling, but I don't know if it's that significant except for large deletions. If it were the case that SSD's reliably returned zeros for unassigned LBAs (i.e. LBAs previously TRIM'd), there could be some optimization for the lowest level to translate page sized writes of only zeros into TRIM commands.

But for performance purposes, I don't see that it makes much difference. Over provisioning and dynamic wear leveling take care of the performance concern. And it seems to me the usual rules of thumbs for chunk size apply; maybe being a bit more on the conservative size (tending to smaller) makes more sense, to avoid large unnecessary RMW.

On the one hand a chunk size exactly sized and aligned with an SSD erase block might seem ideal; but while it might improve the efficiency of the SSD garbage collecting those blocks, it translates into higher wear.

> You just have nothing useful on the server side for trim to do.
> (Although you should manually trim the unpartitioned space.  You only
> need to do so once.)

It's unclear to me that user over provisioning is necessary. The SSD is already over-provisioned. I can see where a mismatch in usage could be a problem, e.g. enterprise patterns while using consumer SSDs.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html