On Wed, Jun 6, 2018 at 8:18 AM, Chris Mason <clm@xxxxxx> wrote: > > > On 5 Jun 2018, at 16:03, Andrew Morton wrote: > >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Tue, 05 Jun 2018 18:01:36 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx >> wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=199931 >>> >>> Bug ID: 199931 >>> Summary: systemd/rtorrent file data corruption when using echo >>> 3 >/proc/sys/vm/drop_caches >> >> >> A long tale of woe here. Chris, do you think the pagecache corruption >> is a general thing, or is it possible that btrfs is contributing? >> >> Also, that 4.4 oom-killer regression sounds very serious. > > > This week I found a bug in btrfs file write with how we handle stable pages. > Basically it works like this: > > write(fd, some bytes less than a page) > write(fd, some bytes into the same page) > btrfs prefaults the userland page > lock_and_cleanup_extent_if_need() <- stable pages > wait for writeback() > clear_page_dirty_for_io() > > At this point we have a page that was dirty and is now clean. That's > normally fine, unless our prefaulted page isn't in ram anymore. > > iov_iter_copy_from_user_atomic() <--- uh oh > > If the copy_from_user fails, we drop all our locks and retry. But along the > way, we completely lost the dirty bit on the page. If the page is dropped > by drop_caches, the writes are lost. We'll just read back the stale > contents of that page during the retry loop. This won't result in crc > errors because the bytes we lost were never crc'd. > So we're going to carefully redirty the page under the page lock, right? > It could result in zeros in the file because we're basically reading a hole, > and those zeros could move around in the page depending on which part of the > page was dirty when the writes were lost. > I got a question, while re-reading this page, wouldn't it read old/stale on-disk data? thanks, liubo > I spent a morning trying to trigger this with drop_caches and couldn't make > it happen, even with schedule_timeout()s inserted and other tricks. But I > was able to get corruptions if I manually invalidated pages in the critical > section. > > I'm working on a patch, and I'll check and see if any of the other recent > fixes Dave integrated may have a less exotic explanation. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html