Re: Testing devices for discard support properly

Lukas Czerner <lczerner@xxxxxxxxxx> · Tue, 7 May 2019 11:40:15 +0200

On Tue, May 07, 2019 at 10:48:55AM +0200, Jan Tulak wrote:
> On Tue, May 7, 2019 at 9:10 AM Lukas Czerner <lczerner@xxxxxxxxxx> wrote:
> >
> > On Mon, May 06, 2019 at 04:56:44PM -0400, Ric Wheeler wrote:
> > >
> ...
> > >
> > > * Whole device discard at the block level both for a device that has been
> > > completely written and for one that had already been trimmed
> >
> > Yes, usefull. Also note that a long time ago when I've done the testing
> > I noticed that after a discard request, especially after whole device
> > discard, the read/write IO performance went down significanly for some
> > drives. I am sure things have changed, but I think it would be
> > interesting to see how does it behave now.
> >
> > >
> > > * Discard performance at the block level for 4k discards for a device that
> > > has been completely written and again the same test for a device that has
> > > been completely discarded.
> > >
> > > * Same test for large discards - say at a megabyte and/or gigabyte size?
> >
> > From my testing (again it was long time ago and things probably changed
> > since then) most of the drives I've seen had largely the same or similar
> > timing for discard request regardless of the size (hence, the conclusion
> > was the bigger the request the better). A small variation I did see
> > could have been explained by kernel implementation and discard_max_bytes
> > limitations as well.
> >
> > >
> > > * Same test done at the device optimal discard chunk size and alignment
> > >
> > > Should the discard pattern be done with a random pattern? Or just
> > > sequential?
> >
> > I think that all of the above will be interesting. However there are two
> > sides of it. One is just pure discard performance to figure out what
> > could be the expectations and the other will be "real" workload
> > performance. Since from my experience discard can have an impact on
> > drive IO performance beyond of what's obvious, testing mixed workload
> > (IO + discard) is going to be very important as well. And that's where
> > fio workloads can come in (I actually do not know if fio already
> > supports this or not).
> >
> 
> And:
> 
> On Tue, May 7, 2019 at 10:22 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> > I have some vague recollection this was brought up before but how sure
> > are we that when a discard request is sent down to disk and a response
> > is returned the actual data has indeed been discarded. What about NCQ
> > effects i.e "instant completion" while doing work in the background. Or
> > ignoring the discard request altogether?
> 
> 
> As Nikolay writes in the other thread, I too have a feeling that there
> have been a discard-related discussion at LSF/MM before. And if I
> remember, there were hints that the drives (sometimes) do asynchronous
> trim after returning a success. Which would explain the similar time
> for all sizes and IO drop after trim.

Yes, that was definitely the case  in the past. It's also why we've
seen IO performance drop after a big (whole device) discard as the
device was busy in the background.

However Nikolay does have a point. IIRC device is free to ignore discard
requests, I do not think there is any reliable way to actually tell that
the data was really discarded. I can even imagine a situation that the
device is not going to do anything unless it's pass some threshold of
free blocks for wear leveling. If that's the case our tests are not
going to be very useful unless they do stress such corner cases. But
that's just my speculation, so someone with a better knowledge of what
vendors are doing might tell us if it's something to worry about or not.

> 
> So, I think that the mixed workload (IO + discard) is a pretty
> important part of the whole topic and a pure discard test doesn't
> really tell us anything, at least for some drives.

I think both are important especially since mixed IO tests are going to
be highly workload specific.

-Lukas

> 
> Jan
> 
> 
> 
> -- 
> Jan Tulak