Re: TRIM vs UNMAP vs WRITE SAME and thin devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James Bottomley wrote:
On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
I have been poked at by some vendors about the status of our support for the virtually/thinly provisioned luns since they are getting close to being able to test with real devices.

With my LSF hat on, a certain array vendor might be sponsoring to get
the opportunity to raise this issue more fully.  The impression (mostly
correct) is that we're thinking about trim/unmap purely from the SSD FTL
point of view and perhaps not being as useful as we might to virtually
provisioned LUNs ... so you could mention to the other vendors that they
might have an interest in coming (and even possibly sponsoring).

That is probably worth bringing up - I don't see this as a large project and should be reasonably quick to get completed given all the work that David and others have already put into it. If you (with you LF hat on :-)) have a standard form or offer process, you might want to poke at NetApp, EMC, Hitachi, IBM, HP and Dell. We both know the names of some people in storage in a few of those companies, others I have less contacts with.

On the other hand, this might also be an opportunity to get them and their engineers on the array side more directly and personally involved.
My quick summary is that we most of the work so far has been done without any real hardware to play with - in 2.6.29-rc3, I don't see any low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD into the specific ATA or SCSI commands. Did I miss something & if not, do we have plans to push anything upstream soonish?

With no devices it's a bit hard.  Also we need at least three pieces for
SSDs: Devices supporting trim, the T13 implementation of TRIM and the
SAT for UNMAP.  We can get the latter two out of the proposals, but it's
still a bit of a moving target.

I think that it has settled a bit - do we have a good sense of the status of the various proposals in T13 and T10?
One note on the SCSI devices, there was a T10 proposal to add an "UNMAP" bit to the "WRITE SAME" command for SCSI. The details of the proposed interface are at:

http://www.t11.org/t10/document.08/08-356r4.pdf

The up side of using WRITE SAME with unmap is that there are no fuzzy semantics about what the unmapped sectors will be - they will all be whatever the WRITE SAME command would have set (usually zeroes I assume).

The summary of write same is that you send down one sector (say 512 bytes of zeroes) and a count so you can do a zeroing of the target without having to send all of the data over the wire. Very useful for initializing members of a RAID device for example to a known pattern.

The down side would be that if we incorrectly send down a WRITE SAME command to a non-thin device, I think that we would kick off a potential extremely long IO. For example, imagine doing a write same of a full TB - that could take an hour which might be an issue :-) Of course, we should not be doing that if we get the code right.

As I read it, non thin provisioned devices can be identified (and may
not even accept WRITE SAME).

I agree that the intersection of write same and thin devices is not going to be 100%. We might end up needing both for SCSI in the worst case I suppose.
I don't see another of the PDF's claims of advantages for file systems to be really all that useful.

With either the write same and its proposed unmap bit or with the original T10 unmap, do we have a short list of infrastructure that needs fleshed out? Anything we can do to help get peoples patches to test with their non-GA thin enabled devices?

Yes, REQ_DISCARD simply isn't broad enough to cope with all the
potential uses of WRITE SAME.  If it's just a mechanism to get known
data into a discard sector, fine, we can set that at the lower level.
However, WRITE SAME has uses beyond TRIM in that it can be used as an
engine for data deduplication.  If vendors are thinking of doing this,
then REQ_DISCARD isn't flexible enough.

I am more interested personally in the sparse support. On the dedup side, I think that most implementations do not rely on write same. They tend to compute hashes on the various blocks and so on.
Is there a similar short list of things to be done for T13 devices with TRIM? Anyone have a chance to test on real hardware yet?

Not that I know of yet.  It's all sort of on hold until actual devices
become available.

James


The vendors certainly have things that they could try in their labs if we can get bits and pieces together for them to test with. We will need to avoid the chicken and egg scenario where they wait for us and we wait for them :-)

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux