On 2024/3/19 19:13, Shinichiro Kawasaki wrote:
On Mar 19, 2024 / 10:22, Chao Yu wrote:
On 2024/3/18 13:47, Shinichiro Kawasaki via Linux-f2fs-devel wrote:
I confirmed that the trigger commit is dbf8e63f48af as Yi reported. I took a
look in the commit, but it looks fine to me. So I thought the cause is not
in the commit diff.
I found the WARN is printed when the f2fs is set up with multiple devices,
and read requests are mapped to the very first block of the second device in the
direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached()
modify map->m_pblk as the physical block address from each block device. It
becomes zero when it is mapped to the first block of the device. However,
f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the
whole f2fs, across the all block devices. It compares map->m_pblk against
NULL_ADDR == 0, then go into the unexpected branch and sets the invalid
iomap->length. The WARN catches the invalid iomap->length.
This WARN is printed even for non-zoned block devices, by following steps.
- Create two (non-zoned) null_blk devices memory backed with 128MB size each:
nullb0 and nullb1.
# mkfs.f2fs /dev/nullb0 -c /dev/nullb1
# mount -t f2fs /dev/nullb0 "${mount_dir}"
# dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192
# dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct
I created a fix candidate patch [1]. It modifies f2fs_iomap_begin() to handle
map->m_pblk as the physical block address from each device start, not the
address of whole f2fs. I confirmed it avoids the WARN.
But I'm not so sure if the fix is good enough. map->m_pblk has dual meanings.
Sometimes it holds the physical block address of each device, and sometimes
the address of the whole f2fs. I'm not sure what is the condition for
map->m_pblk to have which meaning. I guess F2FS_GET_BLOCK_DIO flag is the
condition, but f2fs_map_blocks_cached() does not ensure it.
Also, I noticed that map->m_pblk is referred to in other functions below, and
not sure if they need the similar change as I did for f2fs_iomap_begin().
f2fs_fiemap()
f2fs_read_single_page()
f2fs_bmap()
check_swap_activate()
I would like to hear advices from f2fs experts for the fix.
[1]
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 26e317696b33..5232223a69e5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1569,6 +1569,7 @@ static bool f2fs_map_blocks_cached(struct inode *inode,
int bidx = f2fs_target_device_index(sbi, map->m_pblk);
struct f2fs_dev_info *dev = &sbi->devs[bidx];
+ map->m_multidev_dio = true;
map->m_bdev = dev->bdev;
map->m_pblk -= dev->start_blk;
map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
@@ -4211,9 +4212,11 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
unsigned int flags, struct iomap *iomap,
struct iomap *srcmap)
{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_map_blocks map = {};
pgoff_t next_pgofs = 0;
- int err;
+ block_t pblk;
+ int err, i;
map.m_lblk = bytes_to_blks(inode, offset);
map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
@@ -4239,12 +4242,17 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
* We should never see delalloc or compressed extents here based on
* prior flushing and checks.
*/
- if (WARN_ON_ONCE(map.m_pblk == NEW_ADDR))
+ pblk = map.m_pblk;
+ if (map.m_multidev_dio && map.m_flags & F2FS_MAP_MAPPED)
+ for (i = 0; i < sbi->s_ndevs; i++)
+ if (FDEV(i).bdev == map.m_bdev)
+ pblk += FDEV(i).start_blk;
+ if (WARN_ON_ONCE(pblk == NEW_ADDR))
return -EINVAL;
- if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
+ if (WARN_ON_ONCE(pblk == COMPRESS_ADDR))
return -EINVAL;
Shoudn't we check NEW_ADDR and COMPRESS_ADDR before multiple-device
block address conversion?
As far as I understand, NEW_ADDR and COMPRESS_ADDR in map.m_pblk can be
target of "map->m_pblk -= FDEV(bidx).start_blk;" in f2fs_map_blocks(),
so I guessed that the address conversion should come first.
- if (map.m_pblk != NULL_ADDR) {
+ if (pblk != NULL_ADDR) {
How to distinguish NULL_ADDR and valid blkaddr 0? I guess it should
check F2FS_MAP_MAPPED flag first?
I guessed that physical block address for the whole f2fs (pblk) can not be 0, so
the NULL_ADDR can have zero value. As for the physical block address of each
device (map->m_pblk) can be 0. But this is still my *guess*, and I'm not sure.
The comments from you and Daeho made me rethink. It looks problematic for me
that map->m_pblk has two meanings as I had described: "1) physical block address
from each device start", and "2) physical block address of whole f2fs". So how
about to make it have only one meaning "2) physical block address address of
whole f2fs"? I created another patch below [2]. It removes the
map->m_pblk -= FDEV(bidx).start_blk;
lines in f2fs_map_blocks_cached() and f2fs_map_blocks() so that map->m_pblk do
not have the meaning 1). Instead, the subtraction is done in f2fs_iomap_begin().
I confirmed that this patch also avoids the WARN. I can have more confidence in
this patch, and I hope it is easier to review.
P.S. If anyone has better solution idea, feel free to provide patches. I'm
willing to test them :)
[2]
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 26e317696b33..7404b4fbcba3 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1569,8 +1569,8 @@ static bool f2fs_map_blocks_cached(struct inode *inode,
int bidx = f2fs_target_device_index(sbi, map->m_pblk);
struct f2fs_dev_info *dev = &sbi->devs[bidx];
+ map->m_multidev_dio = true;
map->m_bdev = dev->bdev;
- map->m_pblk -= dev->start_blk;
map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
} else {
map->m_bdev = inode->i_sb->s_bdev;
@@ -1793,11 +1793,8 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
if (map->m_multidev_dio) {
block_t blk_addr = map->m_pblk;
-
bidx = f2fs_target_device_index(sbi, map->m_pblk);
-
map->m_bdev = FDEV(bidx).bdev;
- map->m_pblk -= FDEV(bidx).start_blk;
if (map->m_may_create)
f2fs_update_device_state(sbi, inode->i_ino,
@@ -4211,9 +4208,11 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
unsigned int flags, struct iomap *iomap,
struct iomap *srcmap)
{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_map_blocks map = {};
pgoff_t next_pgofs = 0;
- int err;
+ block_t pblk;
+ int err, bidx;
map.m_lblk = bytes_to_blks(inode, offset);
map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1;
@@ -4249,7 +4248,12 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->type = IOMAP_MAPPED;
iomap->flags |= IOMAP_F_MERGED;
iomap->bdev = map.m_bdev;
- iomap->addr = blks_to_bytes(inode, map.m_pblk);
+ pblk = map.m_pblk;
+ if (map.m_multidev_dio && map.m_flags & F2FS_MAP_MAPPED) {
+ bidx = f2fs_target_device_index(sbi, map.m_pblk);
+ pblk -= FDEV(bidx).start_blk;
+ }
+ iomap->addr = blks_to_bytes(inode, pblk);
} else {
if (flags & IOMAP_WRITE)
return -ENOTBLK;
Hi Shinichiro,
Can you please check below diff? IIUC, for the case: f2fs_map_blocks()
returns zero blkaddr in non-primary device, which is a verified valid
block address, we'd better to check m_flags & F2FS_MAP_MAPPED instead
of map.m_pblk != NULL_ADDR to decide whether tagging IOMAP_MAPPED flag
or not.
---
fs/f2fs/data.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 6f66e3e4221a..41a56d4298c8 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -4203,7 +4203,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
if (WARN_ON_ONCE(map.m_pblk == COMPRESS_ADDR))
return -EINVAL;
- if (map.m_pblk != NULL_ADDR) {
+ if (map.m_flags & F2FS_MAP_MAPPED) {
iomap->length = blks_to_bytes(inode, map.m_len);
iomap->type = IOMAP_MAPPED;
iomap->flags |= IOMAP_F_MERGED;