On Mon, Mar 13, 2017 at 11:11:35PM +0000, Nick Alcock wrote: > On 13 Mar 2017, Eric Biggers spake thusly: > > > On Wed, Mar 01, 2017 at 11:45:52AM +0000, Nick Alcock wrote: > >> [Resend, after the first attempt, from my home address, failed with > >> endless greylisting followed by "4.5.0 Interactive router timed out" > >> from all but the lowest-priority MX for vger, and "Name server: > >> bl-ckh-le.kernel.org.: host not found" for the apparently-nonexistent > >> lowest-priority MX. Maybe it'll work better from here.] > >> > >> I first spotted this -- or it spotted me -- back in the v4.7.x days. It > >> is still present in v4.10. > >> > >> Here's a replication recipe, given a reasonable rootfs with a compiler > >> on it, and assuming a blank virtio disk on /dev/vdb: > > > > Hi Nick, thanks for reporting this. I've sent a patch which should fix this, > > and Cc'ed you. This actually seems to been a bug for a very long time, maybe > > I'll test it. Your timing is supernatural: I was just about to mkfs all > the filesystems on my new server (a once-in-a-decade operation for me) > and was bemoaning the fact that I couldn't turn on inline_data at the > same time. Now I can! (I have good backups so can take suicidally crazy > risks). Glad to hear you have backups! I wouldn't turn on inline_data for files, period. It's not as well tested as it ought to be (clearly). :/ --D > > even ever since the inline_data feature was introduced. (I was able to > > reproduce it in a 3.18 kernel, at least.) I'm not sure why it didn't get > > noticed earlier --- maybe hardly anyone ever writes to small files with mmap... > > Yeah, I built my /usr/src with it and ran for weeks without hitting it: > it wasn't until I rebuilt most of a distro and hit dovecot that anything > went wrong. > > I note that what I saw then was massive filesystem corruption, so > massive that not even tune2fs recognized it as being an ext4 fs > afterwards. Perhaps the thing wrote badness into the journal (possibly > including inline data scribbled over the next inode?) and replayed it > over the fs on the next boot, following which a cascade of increasing > badness ended up eating the entire fs... ah well, I guess it's hard to > know now, months after the fact (though if it's of interest, I still > have an e2image of the corrupted fs lying around!) > > -- > NULL && (void)