Re: [PATCH 2/2] Add batched discard support for ext4.

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Wed, 21 Apr 2010 17:56:47 -0400

On Wed, 2010-04-21 at 17:47 -0400, Greg Freemyer wrote:
> Adding James Bottomley because high-end scsi is entering the
> discussion.  James, I have a couple scsi questions for you at the end.
> 
> On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
> > On 04/21/2010 05:01 PM, Eric Sandeen wrote:
> >>
> >> On 04/21/2010 03:44 PM, Greg Freemyer wrote:
> >>
> >>
> >>>
> >>> Mark's benchmarks showed this as doable in seconds which seems like a
> >>> reasonable amount of time for a mount time operation.
> >>>
> >>
> >> All the other things aside, mount-time is interesting, but it's an
> >> infrequent operation, at least in my world.  I think we need something
> >> that can be done runtime.
> >>
> >> For anything with uptime, I don't think it's acceptable to wait until
> >> the next mount to trim unused blocks.

So what's wrong with using wiper.sh?  It can do online discard of
filesystems that support delayed allocation (ext4, xfs etc.)?

> >> But as long as the mechanism can be called either at mount time and/or
> >> kicked off runtime somehow, I'm happy.
> >>
> >> -Eric
> >>
> >
> > That makes sense to me.  Most enterprise servers will go without remounting
> > a file system for (hopefully!) a very long time.
> >
> > It is really important to keep in mind that this is not just a laptop
> > feature for laptop SSD's, this is also used by high end arrays and *could*
> > be useful for virt IO, etc as well :-)
> >
> > ric
> 
> I'm not arguing that a runtime solution is not needed.
> 
> I'm arguing that at least for SSD backed filesystems Mark's userspace
> implementation shows how the mount time initialization of the runtime
> bitmap can be accomplished in a few seconds by leveraging the hardware
> and using vector'ed trims as opposed to having to build an additional
> on-disk structure.
> 
> At least for SSDs, the primary purpose of the proposed on-disk
> structure seems to be to overcome the current lack of a vector'ed
> discard implementation.
> 
> If it is too difficult to implement a fully functional vector'ed
> discard in the block layer due to locking issues, possibly a special
> purpose version could be written that is only used at mount time when
> one can be assured no other i/o is occurring to the filesystem.
> 
> James,
> 
> The ATA-8 spec. supports vectored trims and requires a minimum of 255
> sectors worth of range payload be supported.  That equates to a single
> trim being able to trim thousands of ranges in one command.
> 
> Mark Lord has benchmarked in found a vectored trim to be drastically
> faster than calling trim individually for each of those ranges.
> 
> Does scsi support vector'ed discard? (ie. write-same commands)

only with UNMAP.  WRITE SAME is effectively single range.

> Or are high-end scsi arrays so fast they can process tens of thousands
> of discard commands in a reasonable amount of time, unlike the SSDs
> have so far proven to do.

No ... they actually have two problems: firstly they can only use
discard ranges which align with their internal block size (usually
something huge like 3/4MB) and then a trim operation tends to be O(1)
and slow, so they'd actually like discard accumulation.

> It would be interesting to find out that a SSD can discard thousands
> of ranges drastically faster than a high-end scsi device can.  But if
> true, that might argue for the on-disk bitmap to track previously
> discarded blocks/extents.

I think SSDs and Arrays both have discard problems, arrays more to do
with the time and expense of the operation, SSDs because the TRIM
command isn't queued.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html