Re: corruption of active mmapped files in btrfs snapshots

Chris Mason <chris.mason@xxxxxxxxxxxx> · Thu, 21 Mar 2013 19:06:21 -0400

Quoting Chris Mason (2013-03-21 14:06:14)
> Quoting Alexandre Oliva (2013-03-21 03:14:02)
> > On Mar 19, 2013, Alexandre Oliva <oliva@xxxxxxx> wrote:
> > 
> > > On Mar 19, 2013, Alexandre Oliva <oliva@xxxxxxx> wrote:
> > >>> that is being processed inside the snapshot.
> > 
> > >> This doesn't explain why the master database occasionally gets similarly
> > >> corrupted, does it?
> > 
> > > Actually, scratch this bit for now.  I don't really have proof that the
> > > master database actually gets corrupted while it's in use
> > 
> > Scratch the “scratch this”.  The master database actually gets
> > corrupted, and it's with recently-created files, created after earlier
> > known-good snapshots.  So, it can't really be orphan processing, can it?
> 
> Right, it can't be orphan processing.
> 
> > 
> > Some more info from the errors and instrumentation:
> > 
> > - no data syncing on the affected files is taking place.  it's just
> >   memcpy()ing data in <4KiB-sized chunks onto mmap()ed areas,
> >   munmap()ing it, growing the file with ftruncate and mapping a
> >   subsequent chunk for further output
> > 
> > - the NULs at the end of pages do NOT occur at munmap/mmap boundaries as
> >   I suspected at first, but they do coincide with the end of extents
> >   that are smaller than the maximum compressed extent size.  So,
> >   something's making btrfs flush pages to disk before the pages are
> >   completely written (which is fine in principle), but apparently
> >   failing to pick up subsequent changes to the pages (eek!)
> 
> With mmap the kernel can pick any given time to start writing out dirty
> pages.  The idea is that if the application makes more changes the page
> becomes dirty again and the kernel writes it again.
> 
> So the question is, can you trigger this without snapshots being done
> at all?  I'll try to make an mmap tester here that hammers on the
> related code.  We usually test this with fsx, which catches all kinds of
> horrors.

So my test program creates an 8GB file in chunks of 1MB each.  Using
truncate to extend the file and then mmap to write into the new hole.
It is writing in 1MB chunks, ever so slightly not aligned.  After
creating the whole file, it reads it back to look for errors.

I'm running this with heavy memory pressure, but no snapshots.  No
corruptions yet, but I'll let it run a while long.

-chris

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html