The minor heck of using mmcblk and ext4

Stefan Bader <stefan.bader@xxxxxxxxxxxxx> · Thu, 19 Apr 2012 18:37:45 +0200

This had been a bug[1] which I saw when playing around with my Aspire One (which
features a mmc driven card reader) and ext4.

The problem is inherited from the way the mmc driver handles cards on suspend.
So when using a mmcblk device and suspending it is like ejecting the card on
suspend.

The problem gets worse when actually using a filesystem on that device. And
there is a difference between having the fs mounted by udisk under /media or
having done the mount manually. I suspect that udisk either adds its own hooks
into udev events or has some sort of watch on the mount point. At least ejecting
the card manually or suspending will usually also an unmount of the fs. This
becomes a bit of a problem when the fs is in use (the simplest form being to
open a shell and changing the current directory into the fs on the card). In
that case the entry in /proc/mounts is gone but some fs structures remain.

And here one specialty of ext4 strikes: it creates some entries under
/proc/fs/ext4/mmcblk*, /proc/fs/jbd2/mmcblk* and /sys/fs/ext4/mmcblk* (and some
internal kobj) which remain dangling together with the structures. So when
resuming the user is presented with:

  a) a scary dialog box telling that mount has failed because of (fuzzy reason
     including missing superblocks) which is even more scary because there has
     been a bug in the past which turned this into reality (good bye data).
  b) three WARN messages (including stack trace) to tell about the duplicate
     objects.

Clearly anything that had a file (or directory node) open will be screwed and
receive IO errors when trying any access. So the obvious mount failure just
makes it very clear that this is a bad place to be in. Or is the expectation
that users of the old mount should just be getting IO errors and the new mount
is supposed to succeed?

I reached a bit my limits of knowledge here (hence the longish email). Should
ext4 be expected to find out its backing device is gone and then at least remove
the sysfs and procfs entries related to that? And even if so, how? Or is it
better the way it is, so at least the fact that the block device went away is
clearly visible? I am pretty sure any filesystem has the same first half of the
issue (maybe dangling open files) but just don't have anything that prevents
them from being remounted.

There also is a relatively simple way to handle it (when one is aware and knows
about it). The mmc core can be instructed to treat cards as non-removable
(mmc_core.removable=0) which has its own dangers. I thought that still on resume
there is a check for the card being the same. I just don't know how reliable (or
how unique the identification is) that is. So while this allows to suspend and
resume without headaches it potentially can cause some more when the card is
replaced without that being detected.

So long story, sort of short question: is there anything that can/should be done
or is it just something that cannot be done any better way?

Thanks,
Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/913860

Attachment:
signature.asc

Description: OpenPGP digital signature