Re: [PATCHv6 3/5] reiser4: discard support: initial implementation using linked lists.

Ivan Shapovalov <intelfx100@xxxxxxxxx> · Wed, 09 Jul 2014 16:40:11 +0400

On Monday 07 July 2014 at 01:47:41, Edward Shishkin wrote:	
> 
> On 06/22/2014 12:48 PM, Ivan Shapovalov wrote:
> [...]
> > +++ b/fs/reiser4/discard.c
> > @@ -0,0 +1,216 @@
> > +/* Copyright 2001, 2002, 2003 by Hans Reiser, licensing governed by
> > + * reiser4/README */
> > +
> > +/* TRIM/discard interoperation subsystem for reiser4. */
> > +
> > +/*
> > + * This subsystem is responsible for populating an atom's ->discard_set and
> > + * (later) converting it into a series of discard calls to the kernel.
> > + *
> > + * The discard is an in-kernel interface for notifying the storage
> > + * hardware about blocks that are being logically freed by the filesystem.
> > + * This is done via calling the blkdev_issue_discard() function. There are
> > + * restrictions on block ranges: they should constitute at least one erase unit
> > + * in length and be correspondingly aligned. Otherwise a discard request will
> > + * be ignored.
> > + *
> > + * The erase unit size is kept in struct queue_limits as discard_granularity.
> > + * The offset from the partition start to the first erase unit is kept in
> > + * struct queue_limits as discard_alignment.
> > + *
> > + * At atom level, we record numbers of all blocks that happen to be deallocated
> > + * during the transaction. Then we read the generated set, filter out any blocks
> > + * that have since been allocated again and issue discards for everything still
> > + * valid. This is what discard.[ch] is here for.
> > + *
> > + * However, simply iterating through the recorded extents is not enough:
> 
> 
> I still don't understand this explanation..

In short: we need to "screen" those extents to check whether they are still
free.

> 
> 
> > + * - if a single extent is smaller than the erase unit, then this particular
> > + *   extent won't be discarded even if it is surrounded by enough free blocks
> > + *   to constitute a whole erase unit;
> 
> 
> Why not to discard the aligned and padded extent, which coincides
> with the whole erase unit?

With a number of whole erase units.

> 
> 
> > + * - we won't be able to merge small adjacent extents forming an extent long
> > + *   enough to be discarded.
> 
> 
> At this point we have already sorted and merged everything.
> So may be it makes sense just to check the head and tail of every resulted
> extent and discard the aligned and padded one?

"Head and tail" is not sufficient. We may check the whole extent with a single
bitmap request, but such algorithm will be very inefficient: it will miss many
possibilities for discarding.

Consider many-block extent, from which one block has been allocated again.
In this case we miss (all-1) blocks to be discarded (if granularity = 1 block).

> 
> Please, consider such possibility. Iterating over erase units in 
> discard_extent()
> looks suboptimal.

Yes, it's costly... but I don't see any other ways to do the task efficiently.

-- 
Ivan Shapovalov / intelfx /

> 
> Thanks,
> Edward.
Attachment:
signature.asc

Description: This is a digitally signed message part.