Re: [PATCH 1/4] xfs: add online discard support

Alex Elder <aelder@xxxxxxx> · Fri, 20 May 2011 08:40:16 -0500

On Fri, 2011-05-20 at 06:24 -0400, Christoph Hellwig wrote:
> On Thu, May 19, 2011 at 04:53:44PM -0500, Alex Elder wrote:
> > The first is, why not support it for non-delaylog?
> 
> Because:
> 
>  a) performance is going to suck even more horribly with the
>     amount of trim commands needed, with no chance of actually
>     fixing it
>  b) the async discard code in patch 3 not easily applyable to
>     the non-delaylog case, we'd need to keep two parallel codebases,
>     one of them guaranteed to be untested.

I hadn't looked through all of the patches yet.

> > Second, why is it a two phase operation (marking an
> > extent for discard, then doing all the discards at
> > once)?  Is it just so you can do the discards without
> > holding the perag lock?
> 
> Because we must prevent the allocation code from reusing an extent
> that is undergoing a discard right now to prevent corruption, thus
> we need to mark it as do not touch first. 

OK.

> > >  	xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
> > >  					ctx->start_lsn, abort);
> > >  
> > >  	xfs_alloc_busy_sort(&ctx->busy_extents);
> > 
> > I still think sorting the list belongs inside xfs_alloc_busy_clear().
> > I see that list_sort() is not necessarily trivial for an already
> > sorted list though...
> 
> It's a bad idea to do the sort twice for no good reason, and adding
> another parameter to further overload xfs_alloc_busy_clear behaviour
> doesn't seem smart either.

It's fine, it's just the purist in me that wants the interface
to hide the dirty work of sorting.  But I concede your points
here--not a big deal.

> > 		if (error == EOPNOTSUPP) {
> > 			/*
> > 			 * Report this once per mount point somehow?
> > 			 * If so, turn off the mount option?
> > 			 */
> > 			break;
> 
> We've been through this discussion again lately with dm and ext4
> folks, and the conclusion is that EOPNOTSUPP is perfectly fine to happen
> here.

The main thing I wanted to suggest was dropping out after
the first one, but considering the underlying device may
be a compound logical volume I understand now why it's
better to go through them all.  (I hadn't been paying
attention to the dm and ext4 discussion.)

> > > +	 * performing the discard a chance to mark the extent unbusy
> > > +	 * and retry.
> > > +	 */
> > > +	if (busyp->flags & XFS_ALLOC_BUSY_DISCARDED) {
> > > +		spin_unlock(&pag->pagb_lock);
> > > +		delay(1);
> > 
> > I hate seeing calls to delay() although sometimes
> > it's the right thing to do...  I don't have a feel
> > for how long a discard is likely to take so I don't
> > know whether waiting here instead would be worth
> > the effort.
> 
> It's not nice, but if the block layer gets fixed and we do asynchronous
> discards it simply goes away.

Sounds good.

> > If this option is to only be available for delaylog, it should
> > say so here (and maybe report that it's being ignored if it's
> > supplied with nodelaylog at mount time).
> 
> ok.
> 

Thanks for answering my questions.  I'll look for an update but
unless there's something obviously wrong I'll turn it around
pretty quickly so this gets in during the merge window.

					-Alex

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs