On 2023/11/30 12:30, Christoph Hellwig wrote: > On Thu, Nov 30, 2023 at 08:54:31AM +0530, Ritesh Harjani wrote: >> So I took a look at this. Here is what I think - >> So this is useful of-course when we have a large folio. Because >> otherwise it's just one block at a time for each folio. This is not a >> problem once FS buffered-io handling code moves to iomap (because we >> can then enable large folio support to it). > > Yes. > >> However, this would still require us to pass a folio to ->map_blocks >> call to determine the size of the folio (which I am not saying can't be >> done but just stating my observations here). > > XFS currently maps based on the underlyig reservation (delalloc extent) > and not the actual map size. This works because on-disk extents are > allocated as unwritten extents, and only the actual written part is > the converted. But if you only want to allocate blocks for the part IIUC, I noticed a side effect of this map method, Let's think about a special case, if we sync a partial range of a delalloc extent, and then the system crashes before the remaining data write back. We could get a file which has allocated blocks(unwritten) beyond EOF. I can reproduce it on xfs. # write 10 blocks, but only sync 3 blocks, i_size becomes 12K after IO complete xfs_io -f -c "pwrite 0 40k" -c "sync_range 0 12k" /mnt/foo # postpone the remaining data writeback echo 20000 > /proc/sys/vm/dirty_writeback_centisecs echo 20000 > /proc/sys/vm/dirty_expire_centisecs # wait 35s to make sure xfs's cil log writeback sleep 35 # triger system crash echo c > /proc/sysrq-trigger After system reboot, the 'foo' file's size is 12K but have 10 allocated blocks. xfs_db> inode 131 core.size = 12288 core.nblocks = 10 ... 0:[0,24,3,0] 1:[3,27,7,1] I'm not expert on xfs, but I don't think this is the right result, is it? Thanks, Yi. > actually written you actually need to pass in the dirty range and not > just use the whole folio. This would be the incremental patch to do > that: > > http://git.infradead.org/users/hch/xfs.git/commitdiff/0007893015796ef2ba16bb8b98c4c9fb6e9e6752 > > But unless your block allocator is very cheap doing what XFS does is > probably going to work much better. > >> ...ok so here is the PoC for seq counter check for ext2. Next let me >> try to see if we can lift this up from the FS side to iomap - > > This looks good to me from a very superficial view. Dave is the expert > on this, though. > > > . >