Re: [PATCH 2/2] Add batched discard support for ext4.

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Wed, 21 Apr 2010 17:47:27 -0400

Adding James Bottomley because high-end scsi is entering the
discussion.  James, I have a couple scsi questions for you at the end.

On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
> On 04/21/2010 05:01 PM, Eric Sandeen wrote:
>>
>> On 04/21/2010 03:44 PM, Greg Freemyer wrote:
>>
>>
>>>
>>> Mark's benchmarks showed this as doable in seconds which seems like a
>>> reasonable amount of time for a mount time operation.
>>>
>>
>> All the other things aside, mount-time is interesting, but it's an
>> infrequent operation, at least in my world.  I think we need something
>> that can be done runtime.
>>
>> For anything with uptime, I don't think it's acceptable to wait until
>> the next mount to trim unused blocks.
>>
>> But as long as the mechanism can be called either at mount time and/or
>> kicked off runtime somehow, I'm happy.
>>
>> -Eric
>>
>
> That makes sense to me.  Most enterprise servers will go without remounting
> a file system for (hopefully!) a very long time.
>
> It is really important to keep in mind that this is not just a laptop
> feature for laptop SSD's, this is also used by high end arrays and *could*
> be useful for virt IO, etc as well :-)
>
> ric

I'm not arguing that a runtime solution is not needed.

I'm arguing that at least for SSD backed filesystems Mark's userspace
implementation shows how the mount time initialization of the runtime
bitmap can be accomplished in a few seconds by leveraging the hardware
and using vector'ed trims as opposed to having to build an additional
on-disk structure.

At least for SSDs, the primary purpose of the proposed on-disk
structure seems to be to overcome the current lack of a vector'ed
discard implementation.

If it is too difficult to implement a fully functional vector'ed
discard in the block layer due to locking issues, possibly a special
purpose version could be written that is only used at mount time when
one can be assured no other i/o is occurring to the filesystem.

James,

The ATA-8 spec. supports vectored trims and requires a minimum of 255
sectors worth of range payload be supported.  That equates to a single
trim being able to trim thousands of ranges in one command.

Mark Lord has benchmarked in found a vectored trim to be drastically
faster than calling trim individually for each of those ranges.

Does scsi support vector'ed discard? (ie. write-same commands)

Or are high-end scsi arrays so fast they can process tens of thousands
of discard commands in a reasonable amount of time, unlike the SSDs
have so far proven to do.

It would be interesting to find out that a SSD can discard thousands
of ranges drastically faster than a high-end scsi device can.  But if
true, that might argue for the on-disk bitmap to track previously
discarded blocks/extents.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html