> The obvious fix for this is that block_write_begin() and > friends should be calling ->setattr to do the truncation and hence > follow normal convention for truncating blocks off an inode. > However, even that appears to have thorns. e.g. in XFS we hold the > iolock exclusively when we call block_write_begin(), but it is not > held in all cases where ->setattr is currently called. Hence calling > ->setattr from block_write_begin in this failure case will deadlock > unless we also pass a "nolock" flag as well. XFS already > supports this (e.g. see the XFS fallocate implementation) but no other > filesystem does (some probably don't need to). This paragraph in particular reminds me of an outstanding bug with O_DIRECT and ext*. It isn't truncating partial allocations when a dio fails with ENOSPC. This was noticed by a user who saw that fsck found bocks outside i_size in the file that saw ENOSPC if they tried to unmount and check the volume after the failed write. So, whether we decide that failed writes should call setattr or vmtruncate, we should also keep the generic O_DIRECT path in consideration. Today it doesn't even try the supposed generic method of calling vmtrunate(). - z (Though I'm sure XFS' dio code already handles freeing blocks :)) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html