Jan, On Wed, Aug 30, 2017 at 4:59 PM, Jan Kara <jack@xxxxxxx> wrote: > On Tue 29-08-17 16:29:42, Andreas Gruenbacher wrote: >> From: Christoph Hellwig <hch@xxxxxx> >> >> Switch to the iomap_seek_hole and iomap_seek_data helpers for >> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that >> isn't needed any more. >> >> Note that with this patch, ext4 will now always depend on the iomap code >> instead of only when CONFIG_DAX is enabled, and it requires adding a >> call into the extent status tree for iomap_begin as well to properly >> deal with delalloc extents. >> >> Signed-off-by: Christoph Hellwig <hch@xxxxxx> >> [Minor fixes and cleanups by Andreas] >> Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > ... > >> @@ -3425,6 +3425,29 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >> >> if (!(flags & IOMAP_WRITE)) { >> ret = ext4_map_blocks(NULL, inode, &map, 0); >> + if (ret < 0) >> + return ret; > > One general question about IOMAP_REPORT: When there is unwritten extent, we > use page_cache_seek_hole_data() to determine what is actually a hole and > what is data. We could use exactly the same function to determine what is a > hole and what is data in an IOMAP_HOLE extent, couldn't we? Now I > understand that a filesystem is supposed to return IOMAP_DELALLOC extent if > there is delayed allocation data covering queried offset so probably it's > not a great idea to use that however it still seems a bit like an > unnecessary duplication... There is no point in scanning the page cache on filesystems that don't support delayed allocation, so it does make sense to distinguish the two types of mappings. >> + if (!ret) { > > Please go to this branch only for IOMAP_REPORT case to avoid unnecessary > overhead for DAX mappings... > >> + struct extent_status es = {}; >> + >> + ext4_es_find_delayed_extent_range(inode, map.m_lblk, >> + map.m_lblk + map.m_len - 1, &es); >> + /* Is delalloc data before next block in extent tree? */ >> + if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) { > > And this is still wrong - I have actually verified that with attached patch > that disables caching of extents (so it is equivalent to *very* aggresive > slab reclaim happening). With that patch applied you'll get: > > kvm0:~ # rm /mnt/file; xfs_io -f -c "pwrite 4096 4096" -c "seek -a 0" > /mnt/file > wrote 4096/4096 bytes at offset 4096 > 4 KiB, 1 ops; 0.0000 sec (16.693 MiB/sec and 4273.5043 ops/sec) > Whence Result > DATA 0 > HOLE 8192 > > Which is obviously wrong - hole should be 0, data should be 4096. > > And the reason why this is wrong is that when we are asked to map things at > first_block and there is a hole at that offset, we need to just truncate > the hole extent returned by ext4_map_blocks() by possibly following > delalloc allocation. But we still need to return that there *is a hole* at > first_block. But your patch seems to try to return the delalloc extent > instead which is wrong. Got it, thanks for the debug patch. I'll send a fixed version of the series. >> + ext4_lblk_t offs = 0; >> + >> + if (es.es_lblk < map.m_lblk) >> + offs = map.m_lblk - es.es_lblk; >> + map.m_lblk = es.es_lblk + offs; >> + map.m_pblk = ext4_es_pblock(&es) + offs; >> + map.m_len = es.es_len - offs; >> + if (ext4_es_is_unwritten(&es)) >> + map.m_flags |= EXT4_MAP_UNWRITTEN; >> + if (ext4_es_is_delayed(&es)) >> + delalloc = true; >> + ret = 1; >> + } >> + } >> } else { >> int dio_credits; >> handle_t *handle; Thanks, Andreas