Re: discard feature, mkfs.ext4 and mmc default fallback to normal erase op

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 8 Dec 2020 at 12:26, Michael Walle <michael@xxxxxxxx> wrote:
>
> Hi Ulf, Hi Ted,
>
> Am 2020-12-08 10:49, schrieb Ulf Hansson:
> > On Tue, 8 Dec 2020 at 03:41, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> >> On Mon, Dec 07, 2020 at 09:39:32PM +0100, Michael Walle wrote:
> >> > > There are three different MMC commands which are defined:
> >> > >
> >> > > 1) DISCARD
> >> > > 2) ERASE
> >> > > 3) SECURE ERASE
> >> > >
> >> > > The first two are expected to be fast, since it only involves clearing
> >> > > some metadata fields in the Flash Translation Layer (FTL), so that the
> >> > > LBA's in the specified range are no longer mapped to a flash page.
> >> >
> >> > Mh, where is it specified that the erase command is fast? According
> >> > to the Physical Layer Simplified Specification Version 8.00:
> >> >
> >> >  The actual erase time may be quite long, and the host may issue CMD7
> >> >  to deselect thhe card or perform card disconnection, as described in
> >> >  the Block Write section, above.
> >
> > Before I go into some more detail, of course I fully agree that
> > dealing with erase/discard from the eMMC/SD specifications (and other
> > types of devices) point of view isn't entirely easy. :-)
> >
> > But I also think we can do better than currently, at least for eMMC/SD.
> >
> >>
> >> I looked at the eMMC specification from JEDEC (JESD84-A44) and there,
> >> both the "erase" and "trim" are specified that the work is to be
> >> queued to be done at a time which is convenient to the controller
> >> (read: FTL).  This is in contrast to the "secure erase" and "secure
> >> trim" commands, where the erasing has to be done NOW NOW NOW for "high
> >> security applications".
>
> Oh this might also be because I've cited from the wrong place, namely
> the
> mmc_init_card() function. But what I really meant was the sd card
> equivalent
> which should be mmc_read_ssr(). Sorry.
>
>         discard_support = UNSTUFF_BITS(resp, 313 - 288, 1);
>         card->erase_arg = (card->scr.sda_specx && discard_support) ?
>                             SD_DISCARD_ARG : SD_ERASE_ARG;

I assumed you were referring to this, but good that you pointed this
out, for clarity.

>
> >> The only difference between "erase" and "trim" seems to be that erahse
> >> has to be done in units of the "erase groups" which is typically
> >> larger than the "write pages" which is the granularity required by the
> >> trim command.  There is also a comment that when you are erasing the
> >> entire partition, "erase" is preferred over "trim".  (Presumably
> >> because it is more convenient?  The spec is not clear.)
> >>
> >> Unfortunately, the SD Card spec and the eMMC spec both read like they
> >> were written by a standards committee stacked by hardware engineers.
> >> It doesn't look like they had file system engineers in the room,
> >> because the distinctions between "erase" and "trim" are pretty silly,
> >> and not well defined.  Aside from what I wrote, the spec is remarkably
> >> silent about what the host OS can depend upon.
> >
> > Moreover, the specs have evolved over the years. Somehow, we need to
> > map a REQ_OP_DISCARD and   to the best matching
> > operation that the currently inserted eMMC/SD card supports...
>
> Do we really need to map these functions? What if we don't have an
> actual discard, but just a slow erase (I'm now assuming that erase
> will likely be slow on sdcards)? Can't we just tell the user space
> there is no discard? Like on a normal HDD?

I have considered that, but not sure what would be the best option.

> I really don't know the
> implications, seems like mmc_erase() is just there for the linux
> discard feature.

mmc_erase() is used for both REQ_OP_DISCARD and REQ_OP_SECURE_ERASE,
but that's an implementation detail that we can change, of course.

Honestly, the hole erase/discard support in the mmc core deserves a
cleanup and I am looking at that (occasionally).

>
> Coming from the user space side. Does mkfs.ext4 assumes its pre-discard
> is fast? I'd think so, right? I'd presume it was intented to tell the
> FTL of the block device, "hey these blocks are unused, you can do some
> wear leveling with them".

I would assume that too.

On the other hand, I guess there are situations when user space could
live with slow formatting times. In particular if the goal is to let
card clean up its internal garbage, as a way to improve "performance"
for later I/O writes.

>
> > Long time time ago, both the SD and eMMC spec introduced support for
> > real discards commands, as being hints to the card without any
> > guarantees of what will happen to the data from a logical or a
> > physical point of view. If the card supports that, we should use it as
> > the first option for REQ_OP_DISCARD. Although, what should we pick as
> > the second best option, when the card doesn't support discard - that's
> > when it becomes more tricky. And the similar applies for
> > REQ_OP_SECURE_ERASE, or course.
> >
> > If you have any suggestions for how we can improve in the above
> > decisions, feel free to suggest something.
> >
> > Another issue that most likely is causing poor performance for
> > REQ_OP_DISCARD/REQ_OP_SECURE_ERASE for eMMC/SD, is that in
> > mmc_queue_setup_discard() we set up the maximum discard sectors
> > allowed per request and the discard granularity.
> >
> > To find performance bottlenecks, I would start looking at what actual
> > eMMC/SD commands/args we end up mapping towards the
> > REQ_OP_DISCARD/REQ_OP_SECURE_ERASE requests. Then definitely, I would
> > also look at the values we end up picking as max discard sectors and
> > the discard granularity.
>
> I'm just about finding some SD cards and looking how they behave timing
> wise and what they report they support (ie. erase or discard). Looks
> like other cards are doing better. But I'd have to find out if they
> support the discard (mine doesn't) and if they are slow too if I force
> them to use the normal erase.

Sounds great, looking forward to hear more about your findings.

[...]

Kind regards
Uffe



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux