On May 6, 2016, at 7:00 AM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Fri, May 06, 2016 at 06:01:11AM +0000, Daeho Jeong wrote: >>> Hmm, I'm not really comfortable with putting this hack in, since this >>> is papering over the real problem, which is that Android is trying to >>> use the emergency remount read-only sysrq option and this is >>> fundamentally unsafe. I'm not sure what else could break if it is >>> situation normal that there is active processes busily writing to the >>> file system and sysrq-u followed by reboot is the normal way the >>> Android kernel does a reboot. >> >>> A much better solution would be to change the Android userspace to >>> call the FIFREEZE ioctl on each mounted file system, and then call for >>> a reboot. >> >> I agree with you. I know that current Android shutdown procedure is >> not a safe way. But, without this patch, "even not in Android system", >> when we trigger the emergency read-only remount while evicting inodes, >> i_size of the inode becomes zero and the inode is not in orphan list, >> but blocks of the inode are still allocated to the inode, because >> ext4_truncate() will fail while stating the handle which was already started >> by ext4_evict_inode(). This causes the filesystem inconsistency and >> we will encounter an ext4 kernel panic in the next boot by this problem. >> >> I think that this kind of filesystem crash can occur anywhere that >> the same journal handle is repeatly used. During an atomic filesystem >> operation, a part of the operation will succeed and the other one will fail. > > Sure, but if this were a bug, then it could happen under a normal > crash scenario. In this particular case, under what circumstances > could i_size be set to zero *and* the inode is not on the orphan list > *and* ext4_evict_inode() is getting called? > > If that could happen, then we could have problems without emergency > unmount, and we should fix that problem. > > The real problem here is that we want emergency unmount to be > completely safe, but setting MS_RDONLY randomly isn't safe against file > system corruption. The idea is you do this only when you have no > other choice, and the consequences would be worse --- and where you > would be prepared to do a file system consistency check afterwards. > > We can either fix Android userspace, or we could add a per-file system > callback which tries (as much as possible) to make an emergency > unmount safe. In this particular case, it would probably involve > setting the "file system is corrupt flag" to force a file system > consistency check at the next reboot. Because if you use emergency > unmount, all bets are off, and there may be other problems.... The problem is that emergency remount-ro doesn't block in-progress writes, since most operations only check the MS_RDONLY at the start of an operation. It would be possible for do_emergency_remount() call ->freeze_fs() first for all the filesystems, then doing the remount read-only (would need a change to do_remount_ro() to allow this)? That ensures the filesystem is in a (more) consistent state when force remount-ro is called (i.e. which doesn't block or return an error if there are writers on the filesystem). I don't think this is an issue for normal remount-ro, since there can't be any writes in progress. This would also avoid the need for Android to know more about the internals of how to remount read-only, and probably help normal sysadmins too. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail