Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed)

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Fri, 14 Aug 2009 18:54:11 -0400

On Fri, Aug 14, 2009 at 6:03 PM, Mark Lord<liml@xxxxxx> wrote:
> James Bottomley wrote:
>>
>> On Thu, 2009-08-13 at 14:15 -0400, Greg Freemyer wrote:
>>>
>>> On Thu, Aug 13, 2009 at 12:33 PM, <david@xxxxxxx> wrote:
>>>>
>>>> On Thu, 13 Aug 2009, Markus Trippelsdorf wrote:
>>>>
>>>>> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote:
>>>>>>
>>>>>> I am planning a complete overhaul of the discard work.  Users can send
>>>>>> down discard requests as frequently as they like.  The block layer
>>>>>> will
>>>>>> cache them, and invalidate them if writes come through.  Periodically,
>>>>>> the block layer will send down a TRIM or an UNMAP (depending on the
>>>>>> underlying device) and get rid of the blocks that have remained
>>>>>> unwanted
>>>>>> in the interim.
>>>>>
>>>>> That is a very good idea. I've tested your original TRIM implementation
>>>>> on
>>>>> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of
>>>>> milliseconds to digest a single TRIM command. And since your
>>>>> implementation
>>>>> sends a TRIM for each extent of each deleted file, the whole system is
>>>>> unusable after a short while.
>>>>> An optimal solution would be to consolidate the discard requests,
>>>>> bundle
>>>>> them and send them to the drive as infrequent as possible.
>>>>
>>>> or queue them up and send them when the drive is idle (you would need to
>>>> keep track to make sure the space isn't re-used)
>>>>
>>>> as an example, if you would consider spinning down a drive you don't
>>>> hurt
>>>> performance by sending accumulated trim commands.
>>>>
>>>> David Lang
>>>
>>> An alternate approach is the block layer maintain its own bitmap of
>>> used unused sectors / blocks. Unmap commands from the filesystem just
>>> cause the bitmap to be updated.  No other effect.
>>>
>>> (Big unknown: Where will the bitmap live between reboots?  Require DM
>>> volumes so we can have a dedicated bitmap volume in the mix to store
>>> the bitmap to? Maybe on mount, the filesystem has to be scanned to
>>> initially populate the bitmap?   Other options?)
>>
>> I wouldn't really have it live anywhere.  Discard is best effort; it's
>> not required for fs integrity.  As long as we don't discard an in-use
>> block we're free to do anything else (including forget to discard,
>> rediscard a discarded block etc).
>>
>> It is theoretically possible to run all of this from user space using
>> the fs mappings, a bit like a defrag command.
>
> ..
>
> Already a work-in-progress -- see my wiper.sh script on the hdparm page
> at sourceforge.  Trimming 50+GB of free space on a 120GB Vertex
> (over 100 million sectors) takes a *single* TRIM command,
> and completes in only a couple of seconds.
>
> Cheers
>
Mark,

What filesystems does your script support?  Running a tool like this
in the middle of the night makes a lot of since to me even from the
perspective of many / most enterprise users.

How do prevent a race where a block becomes used between userspace
asking status and it sending the discard request?

ps: I tried to pull wiper.sh straight from sourceforge, but I'm
getting some crazy page asking all sorts of questions and not letting
me bypass it.  I hope sourceforge is broken.  The other option is they
meant to do this. :(

Greg
-- 
Greg Freemyer
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html