I suggest a viro cc on this one. On Mon, 23 Jan 2012 10:38:29 +0000 Niels de Vos <ndevos@xxxxxxxxxx> wrote: > Executing an fsync() on a file-descriptor of a partition flushes the > caches for that partition by calling blkdev_issue_flush(). However, it The changelog is stale. > seems that reading data through the parent device will still return the > old cached data. > > The cache for the block-device is not synced if the block-device is kept > open (due to a mounted partition, for example). Only when all users for > the disk have exited, the cache for the disk is made consistent again. > > Calling invalidate_bdev() on the parent block-device in case > blkdev_fsync() was called for a partition, fixes this. > > The problem can be worked around by forcing the caches to be flushed > with either > # blockdev --flushbufs ${dev_disk} > or > # echo 3 > /proc/sys/vm/drop_caches Please include your (useful) problem description in the changelog: : The problem that was noticed is the following: : 1) create two or more partitions on a device : - use fdisk to create /dev/sdb1 and /dev/sdb2 : 2) format and mount one of the partition : - mkfs -t ext3 /dev/sdb1 : 3) read through the main device to have something in the cache : - read /dev/sdb with dd or use something like "parted /dev/sdb print" : 4) now write something to /dev/sdb2, format the partition for example : - mkfs -t ext3 /dev/sdb2 : 5) read the blocks where sdb2 starts, through /dev/sdb : - use dd or do again a "parted /dev/sdb print" : : Without this patch, calling "blockdev --flushbufs" or dropping the : caches, the result in 5) is the same as in 3). Reading the same area : through /dev/sdb2 shows the inconsistancy between the two caches. : : With this patch, or one of the workarounds, the data read through : /dev/sdb and /dev/sdb2 is the same. > > ... > > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -424,6 +424,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync) > if (error == -EOPNOTSUPP) > error = 0; > > + /* invalidate parent block_device */ > + if (!error && bdev != bdev->bd_contains) > + invalidate_bdev(bdev->bd_contains); > + > return error; > } It doesn't seem terribly logical to do this in blkdev_fsync(). Why not do it right there in blkdev_ioctl()'s "case BLKFLSBUF"? Bear in mind that invalidate_bdev() isn't a very strong function - it won't drop pages which are dirty or under writeback nor pages which others have a reference on. But I can see that this change would be a best-effort user-convenience thing. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html