On Wed, Jun 6, 2018 at 9:44 PM, Chris Mason <clm@xxxxxx> wrote: > On 6 Jun 2018, at 9:38, Liu Bo wrote: > >> On Wed, Jun 6, 2018 at 8:18 AM, Chris Mason <clm@xxxxxx> wrote: >>> >>> >>> >>> On 5 Jun 2018, at 16:03, Andrew Morton wrote: >>> >>>> (switched to email. Please respond via emailed reply-to-all, not via >>>> the >>>> bugzilla web interface). >>>> >>>> On Tue, 05 Jun 2018 18:01:36 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx >>>> wrote: >>>> >>>>> https://bugzilla.kernel.org/show_bug.cgi?id=199931 >>>>> >>>>> Bug ID: 199931 >>>>> Summary: systemd/rtorrent file data corruption when using >>>>> echo >>>>> 3 >/proc/sys/vm/drop_caches >>>> >>>> >>>> >>>> A long tale of woe here. Chris, do you think the pagecache corruption >>>> is a general thing, or is it possible that btrfs is contributing? >>>> >>>> Also, that 4.4 oom-killer regression sounds very serious. >>> >>> >>> >>> This week I found a bug in btrfs file write with how we handle stable >>> pages. >>> Basically it works like this: >>> >>> write(fd, some bytes less than a page) >>> write(fd, some bytes into the same page) >>> btrfs prefaults the userland page >>> lock_and_cleanup_extent_if_need() <- stable pages >>> wait for writeback() >>> clear_page_dirty_for_io() >>> >>> At this point we have a page that was dirty and is now clean. That's >>> normally fine, unless our prefaulted page isn't in ram anymore. >>> >>> iov_iter_copy_from_user_atomic() <--- uh oh >>> >>> If the copy_from_user fails, we drop all our locks and retry. But along >>> the >>> way, we completely lost the dirty bit on the page. If the page is >>> dropped >>> by drop_caches, the writes are lost. We'll just read back the stale >>> contents of that page during the retry loop. This won't result in crc >>> errors because the bytes we lost were never crc'd. >>> >> >> So we're going to carefully redirty the page under the page lock, right? > > > I don't think we actually need to clean it. We have the page locked, > writeback won't start until we unlock. > My concern is that the buggy thing is similar to compression path, where we also did the trick of clear_page_dirty_for_io and redirty_pages to avoid any faults wandering in and changing pages underneath, but seems here we're fine if pages get changed in between. >> >>> It could result in zeros in the file because we're basically reading a >>> hole, >>> and those zeros could move around in the page depending on which part of >>> the >>> page was dirty when the writes were lost. >>> >> >> I got a question, while re-reading this page, wouldn't it read >> old/stale on-disk data? > > > If it was never written we should be treating it like a hole, but I'll > double check. > Okay, I think this would also happen in the overwrite case, where stale data lies on disk. thanks, liubo