On Fri, Jun 05, 2020 at 09:30:53AM +1000, Dave Chinner wrote: > On Thu, Jun 04, 2020 at 04:05:19PM -0700, Matthew Wilcox wrote: > > On Fri, Jun 05, 2020 at 08:57:26AM +1000, Dave Chinner wrote: > > > On Thu, Jun 04, 2020 at 01:23:40PM -0700, Matthew Wilcox wrote: > > > > From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> > > > > > > > > Test generic/019 often results in: > > > > > > > > WARNING: at fs/iomap/buffered-io.c:1069 iomap_page_mkwrite_actor+0x57/0x70 > > > > > > > > Since this can happen due to a storage error, we should not WARN for it. > > > > Just return -EIO, which will be converted to a SIGBUS for the hapless > > > > task attempting to write to the page that we can't read. > > > > > > Why didn't the "read" part of the fault which had the EIO error fail > > > the page fault? i.e. why are we waiting until deep inside the write > > > fault path to error out on a failed page read? > > > > I have a hypothesis that I don't know how to verify. > > > > First the task does a load from the page and we put a read-only PTE in > > the page tables. Then it writes to the page using write(). The page > > gets written back, but hits an error in iomap_writepage_map() > > which calls ClearPageUptodate(). Then the task with it mapped attempts > > to store to it. > > Sure, but that's not really what I was asking: why isn't this > !uptodate state caught before the page fault code calls > ->page_mkwrite? The page fault code has a reference to the page, > after all, and in a couple of paths it even has the page locked. If there's already a PTE present, then the page fault code doesn't check the uptodate bit. Here's the path I'm looking at: do_wp_page() -> vm_normal_page() -> wp_page_shared() -> do_page_mkwrite() I don't see anything in there that checked Uptodate. > What I'm trying to understand is why this needs to be fixed inside > ->page_mkwrite, because AFAICT if we have to fix this in the iomap > code, we have the same "we got handed a bad page by the page fault" > problem in every single ->page_mkwrite implementation.... I think the iomap code is the only filesystem which clears PageUptodate on errors. I know we've argued about whether that's appropriate or not in the past. > > I haven't dug through what generic/019 does, so I don't know how plausible > > this is. > > # Run fsstress and fio(dio/aio and mmap) and simulate disk failure > # check filesystem consistency at the end. > > I don't think it is mixing buffered writes and mmap writes on the > same file via fio. Maybe fsstress is triggering that, but the > fsstress workload is single threaded so, again, I'm not sure it will > do this. Maybe that's not how we end up with a read-only PTE in the process's page tables. Perhaps it starts out with a store, then on an fsync() we mark it read-only, then try to do another store.