On Thu, Mar 24, 2016 at 02:12:23PM +0100, Jan Kara wrote: > Hello, > > yesterday I have been stress-testing mmap code with my new fault locking > patches and I have found a data corruption issue when file is written both > via mmap and standard write(2). The problem is following: > > CPU1 CPU2 > dax_io() dax_fault() > get_block() - allocates block > ... get_block() - finds allocated block > - zeroes it inside fs > fault completese > > if (buffer_unwritten(bh) || buffer_new(bh)) -> new buffer > dax_new_buf() -> zeroes buffer which may > overwrite user data Which filesystem? XFS should be, in both cases, zeroing the block entirely inside get_block() when it is first allocated. i.e we should see: CPU1 CPU2 dax_io() dax_fault() get_block() - allocates block - zeroes it inside fs ... get_block() - finds allocated block fault completes buffer returned is not new or unwritten. > In some cases the race can also go the other way around and we lose data > written by write. It shouldn't: CPU1 CPU2 dax_io() dax_fault() get_block() - allocates block - zeroes it inside fs .... get_block() - finds allocated block buffer returned is not new or unwritten. fault completes > So either we need to do the zeroing inside fs also for write(2) path (but > that would essentially mean we would write the block twice for each > allocating write) Yup, that's what XFS is supposed to be doing right now... > or we would need dax_io() to also use radix tree locking > to serialize against page faults (in the same way page cache does this with > page lock). Any opinion on what would be better? Don't care, as long as data is not corrupted ;) Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html