On Thu, Sep 24, 2020 at 02:59:00PM +0100, Matthew Wilcox wrote: > On Thu, Sep 24, 2020 at 09:12:35AM -0400, Brian Foster wrote: > > On Thu, Sep 24, 2020 at 01:56:08PM +0100, Matthew Wilcox (Oracle) wrote: > > > For filesystems with block size < page size, we need to set all the > > > per-block uptodate bits if the page was already uptodate at the time > > > we create the per-block metadata. This can happen if the page is > > > invalidated (eg by a write to drop_caches) but ultimately not removed > > > from the page cache. > > > > > > This is a data corruption issue as page writeback skips blocks which > > > are marked !uptodate. > > > > Thanks. Based on my testing of clearing PageUptodate here I suspect this > > will similarly prevent the problem, but I'll give this a test > > nonetheless. > > > > I am a little curious why we'd prefer to fill the iop here rather than > > just clear the page state if the iop data has been released. If the page > > is partially uptodate, then we end up having to re-read the page > > anyways, right? OTOH, I guess this behavior is more consistent with page > > size == block size filesystems where iop wouldn't exist and we just go > > by page state, so perhaps that makes more sense. > > Well, it's _true_ ... the PageUptodate bit means that every byte in this > page is at least as new as every byte on storage. There's no need to > re-read it, which is what we'll do if we ClearPageUptodate. > Yes, of course. I'm just noting the inconsistent behavior between a full and partially uptodate page. Brian > My original motivation for this was splitting a THP. In that case, > we have, let's say, 16 * 4kB pages, and an iop for 64 blocks. When we > split that 64kB page into 16 4kB pages, we can't afford to allocate 16 > iops for them, so we just drop the iop and copy the uptodate state from > the head page to all subpages. > > So now we have 16 pages, all marked uptodate (and with valid data) but > no iop. So we need to create an iop for each page during the writeback > path, and that has to be created with uptodate bits or we'll skip the > entire page. When I wrote the patch below, I had no idea we could > already get an iop allocated for an uptodate page, or I would have > submitted this patch months ago. > > http://git.infradead.org/users/willy/pagecache.git/commitdiff/bc503912d4a9aad4496a4591e9992f0ada47a9c9 >