Re: DAX data corruption for mmaped and written files

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 29 Mar 2016 09:39:16 +1100

On Thu, Mar 24, 2016 at 02:12:23PM +0100, Jan Kara wrote:
> Hello,
> 
> yesterday I have been stress-testing mmap code with my new fault locking
> patches and I have found a data corruption issue when file is written both
> via mmap and standard write(2). The problem is following:
> 
> CPU1					CPU2
> dax_io()				dax_fault()
>   get_block() - allocates block
>   ...					  get_block() - finds allocated block
> 					    - zeroes it inside fs
> 					  fault completese
> 
>   if (buffer_unwritten(bh) || buffer_new(bh)) -> new buffer
>     dax_new_buf() -> zeroes buffer which may
> 		     overwrite user data

Which filesystem? XFS should be, in both cases, zeroing the
block entirely inside get_block() when it is first allocated. i.e we
should see:

CPU1					CPU2
dax_io()				dax_fault()
  get_block() - allocates block
              - zeroes it inside fs
  ...					  get_block() - finds allocated block
					  fault completes

  buffer returned is not new or unwritten.

> In some cases the race can also go the other way around and we lose data
> written by write.

It shouldn't:

CPU1					CPU2
dax_io()				dax_fault()
					  get_block() - allocates block
						      - zeroes it inside fs
					  ....
  get_block() - finds allocated block
  buffer returned is not new or unwritten.
					  fault completes

> So either we need to do the zeroing inside fs also for write(2) path (but
> that would essentially mean we would write the block twice for each
> allocating write)

Yup, that's what XFS is supposed to be doing right now...

> or we would need dax_io() to also use radix tree locking
> to serialize against page faults (in the same way page cache does this with
> page lock). Any opinion on what would be better?

Don't care, as long as data is not corrupted ;)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html