Hi Ted,
Am 2020-12-07 19:35, schrieb Theodore Y. Ts'o:
On Mon, Dec 07, 2020 at 04:10:27PM +0100, Michael Walle wrote:
Hi,
The problem I'm having is that I'm trying to install debian on
an embedded system onto an sdcard. During installation it will
format the target filesystem, but the "mkfs.ext4 -F /dev/mmcblk0p2"
takes ages.
What I've found out so far:
- mkfs.ext4 tries to discard all blocks on the target device
- with my target device being an sdcard it seems to fallback
to normal erase [1], with erase_arg being set to what the card
is capable of [2]
Now I'm trying to figure out if this behavior is intended. I guess
one can reduce it to "blkdiscard /dev/mmcblk0p2". Should this
actually fall back to normal erasing or should it return -EOPNOTSUPP?
There are three different MMC commands which are defined:
1) DISCARD
2) ERASE
3) SECURE ERASE
The first two are expected to be fast, since it only involves clearing
some metadata fields in the Flash Translation Layer (FTL), so that the
LBA's in the specified range are no longer mapped to a flash page.
Mh, where is it specified that the erase command is fast? According
to the Physical Layer Simplified Specification Version 8.00:
The actual erase time may be quite long, and the host may issue CMD7
to deselect the card or perform card disconnection, as described in
the Block Write section, above.
Honest question. Also reading "4.14 Erase Timeout Calculation" doesn't
sound that it is fast.
Also there is this comment:
https://elixir.bootlin.com/linux/v5.9.12/source/drivers/mmc/core/core.c#L1495
The difference between "discard" and "erase" is that "discard" is a
hint, so the device is allowed to ignore it whenever it wants (in
practice, if it's busy doing a GC, or if it's busy writing back blocks
in its writeback cache). "Erase" is guaranteed to work, in that after
an erase, a read from a specified sector MUST return all zeros, but
that can easily be done by redirecting a point in the FTL metadata.
"Secure Erase" is the one which can be slow, since it requires
physically zeroing all of the flash pages (although if the device is
self-encrypting, this in theory could also be fast if you're doing a
secure erase at the granularity of the device's encryption keys, so
all it needs to do is to regenerate the crypto key).
It sounds like your SD card is implementing the "erase" command in a
particularly non-optimal way. If it's common, perhaps we need some
kind of blacklist for drivers with badly implemented erase commands.
As a workaround, you can run mke2fs with the command-line option "-E
discard=0".
I've already tested that "mkfs.ext4 -E nodiscard" is fast (or works in
the same way as before the pre-discard feature).
But I wouldn't say it is a cheapo card (Toshiba Exceria). Although I
cannot guarantee that it might be a china clone, but it looks authentic
;)
P.S. If your SD card got "erase" wrong, I'd be a little worried about
what else the FTL implementation may have screwed up. So you want to
under simply getting a different SD card --- especially if this is
something that you plan to distribute as a product to downstream
customers. In general, low-end flash needs to be very carefully
qualified to make sure they are competently implemented if you plan to
deploy in large quantities. An example of what happen if this
qualification process is not done:
https://insideevs.com/news/376037/tesla-mcu-emmc-memory-issue/
Tesla is currently under investigation by the National Highway Traffic
Safety Administration due to cheaping out on their eMMC flash
(probably just a few pennies per unit). Given that customers are
having to pay $1500 to replace their engine controller out of warranty
(and the NHTSA is considering whether or not to force Tesla to eat the
costs, as opposed to forcing their customers to pay $$$), that's an
example of false economy....
Yeah I'm aware of the Tesla eMMC wear-out problem. But I've seen this
esp. from a user point of view. Like take our product, where the user
can freely choose its sdcard just to then notice that the installation
of its distribution is painfully slow. So I'm interested in
understanding
the implications. Like is it really the case that the erase command can
be assumed fast.
-michael