This had been a bug[1] which I saw when playing around with my Aspire One (which features a mmc driven card reader) and ext4. The problem is inherited from the way the mmc driver handles cards on suspend. So when using a mmcblk device and suspending it is like ejecting the card on suspend. The problem gets worse when actually using a filesystem on that device. And there is a difference between having the fs mounted by udisk under /media or having done the mount manually. I suspect that udisk either adds its own hooks into udev events or has some sort of watch on the mount point. At least ejecting the card manually or suspending will usually also an unmount of the fs. This becomes a bit of a problem when the fs is in use (the simplest form being to open a shell and changing the current directory into the fs on the card). In that case the entry in /proc/mounts is gone but some fs structures remain. And here one specialty of ext4 strikes: it creates some entries under /proc/fs/ext4/mmcblk*, /proc/fs/jbd2/mmcblk* and /sys/fs/ext4/mmcblk* (and some internal kobj) which remain dangling together with the structures. So when resuming the user is presented with: a) a scary dialog box telling that mount has failed because of (fuzzy reason including missing superblocks) which is even more scary because there has been a bug in the past which turned this into reality (good bye data). b) three WARN messages (including stack trace) to tell about the duplicate objects. Clearly anything that had a file (or directory node) open will be screwed and receive IO errors when trying any access. So the obvious mount failure just makes it very clear that this is a bad place to be in. Or is the expectation that users of the old mount should just be getting IO errors and the new mount is supposed to succeed? I reached a bit my limits of knowledge here (hence the longish email). Should ext4 be expected to find out its backing device is gone and then at least remove the sysfs and procfs entries related to that? And even if so, how? Or is it better the way it is, so at least the fact that the block device went away is clearly visible? I am pretty sure any filesystem has the same first half of the issue (maybe dangling open files) but just don't have anything that prevents them from being remounted. There also is a relatively simple way to handle it (when one is aware and knows about it). The mmc core can be instructed to treat cards as non-removable (mmc_core.removable=0) which has its own dangers. I thought that still on resume there is a check for the card being the same. I just don't know how reliable (or how unique the identification is) that is. So while this allows to suspend and resume without headaches it potentially can cause some more when the card is replaced without that being detected. So long story, sort of short question: is there anything that can/should be done or is it just something that cannot be done any better way? Thanks, Stefan [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/913860
Attachment:
signature.asc
Description: OpenPGP digital signature