On Tue, Jun 06, 2023 at 05:21:32AM +0530, Ritesh Harjani wrote: > "Darrick J. Wong" <djwong@xxxxxxxxxx> writes: > > > On Mon, Jun 05, 2023 at 04:25:03PM +0530, Ritesh Harjani (IBM) wrote: > >> We would eventually use iomap_iop_** function naming by the rest of the > >> buffered-io iomap code. This patch update function arguments and naming > >> from iomap_set_range_uptodate() -> iomap_iop_set_range_uptodate(). > >> iop_set_range_uptodate() then becomes an accessor function used by > >> iomap_iop_** functions. > >> > >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > >> --- > >> fs/iomap/buffered-io.c | 111 +++++++++++++++++++++++------------------ > >> 1 file changed, 63 insertions(+), 48 deletions(-) > >> > >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > >> index 6fffda355c45..136f57ccd0be 100644 > >> --- a/fs/iomap/buffered-io.c > >> +++ b/fs/iomap/buffered-io.c > >> @@ -24,14 +24,14 @@ > >> #define IOEND_BATCH_SIZE 4096 > >> > >> /* > >> - * Structure allocated for each folio when block size < folio size > >> - * to track sub-folio uptodate status and I/O completions. > >> + * Structure allocated for each folio to track per-block uptodate state > >> + * and I/O completions. > >> */ > >> struct iomap_page { > >> atomic_t read_bytes_pending; > >> atomic_t write_bytes_pending; > >> - spinlock_t uptodate_lock; > >> - unsigned long uptodate[]; > >> + spinlock_t state_lock; > >> + unsigned long state[]; > >> }; > >> > >> static inline struct iomap_page *to_iomap_page(struct folio *folio) > >> @@ -43,6 +43,48 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) > >> > >> static struct bio_set iomap_ioend_bioset; > >> > >> +static bool iop_test_full_uptodate(struct folio *folio) > > > > Same comment as Andreas, I think this works better with 'fully', e.g. > > > > iop_test_fully_uptodate() > > > > Why you don't pass the iomap_page directly into this function? Doesn't > > that eliminate the need for iomap_iop_free to keep folio->private set > > until the very end? > > > > static inline bool > > iomap_iop_is_fully_uptodate(const struct iomap_page *iop, > > const struct folio *folio) > > > > Same sort of thing for the second function -- we already extracted > > folio->private and checked it wasn't null, so we don't need to do that > > again. > > > > static inline bool > > iomap_io_is_block_uptodate(const struct iomap_page *iop, > > const struct folio *folio, > > unsigned int block) > > > >> +{ > >> + struct iomap_page *iop = to_iomap_page(folio); > >> + struct inode *inode = folio->mapping->host; > >> + > >> + return bitmap_full(iop->state, i_blocks_per_folio(inode, folio)); > >> +} > >> + > >> +static bool iop_test_block_uptodate(struct folio *folio, unsigned int block) > >> +{ > >> + struct iomap_page *iop = to_iomap_page(folio); > >> + > >> + return test_bit(block, iop->state); > >> +} > >> + > >> +static void iop_set_range_uptodate(struct inode *inode, struct folio *folio, > >> + size_t off, size_t len) > >> +{ > >> + struct iomap_page *iop = to_iomap_page(folio); > >> + unsigned int first_blk = off >> inode->i_blkbits; > >> + unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; > >> + unsigned int nr_blks = last_blk - first_blk + 1; > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&iop->state_lock, flags); > >> + bitmap_set(iop->state, first_blk, nr_blks); > >> + if (iop_test_full_uptodate(folio)) > >> + folio_mark_uptodate(folio); > >> + spin_unlock_irqrestore(&iop->state_lock, flags); > >> +} > >> + > >> +static void iomap_iop_set_range_uptodate(struct inode *inode, > > > > I don't understand why iomap_set_range_uptodate is now > > iomap_iop_set_range_uptodate; it doesn't take an iomap_page object as an > > argument...? > > > > I thought I understood that iomap_FOO operates on a folio and a range, > > whereas iomap_iop_FOO operates on sub-blocks within a folio? And that > > you were renaming the iop_* functions to iomap_iop_*? > > > > I'm also not sure why iop_set_range_uptodate needs to be passed the > > struct inode; can't it extract that from folio->mapping->host, like > > current upstream does? > > > > So, I do have a confusion in __folio_mark_dirty() function... > > i.e. __folio_mark_dirty checks whether folio->mapping is not NULL. > That means for marking range of blocks dirty within iop from > ->dirty_folio(), we can't use folio->mapping->host is it? > We have to use inode from mapping->host (mapping is passed as a > parameter in ->dirty_folio). Ah, yeah. folio->mapping can become NULL if truncate races with us in removing the folio from the foliocache. For regular reads and writes this is a nonissue because those paths all take i_rwsem and will block truncate. However, for page_mkwrite, xfs doesn't take mmap_invalidate_lock until after the vm_fault has been given a folio to play with. I think that means we can't rely on folio->mapping to be non-null, and hence can't dereference folio->mapping->host. That said, if the folio's locked and folio->mapping is null, the page has been truncated so we could just return VM_FAULT_NOPAGE, perhaps? (willy might know better than I do...) > I tried looking into the history of this, but I couldn't find any. > This is also the reason I thought we would need to pass an inode for the > iop_set_range_dirty() function (because it is also called from > ->dirty_folio -> iomap_dirty_folio()) and to keep it consistent, I kept > inode argument for iop_**_uptodate functions as well. That seems ok to me, now that you've brought that ^^ to my attention. --D > Thoughts please? > > <some code snippet> > ==================== > > iomap_dirty_folio -> filemap_dirty_folio-> __folio_mark_dirty() > > void __folio_mark_dirty(struct folio *folio, struct address_space *mapping, > int warn) > { > unsigned long flags; > > xa_lock_irqsave(&mapping->i_pages, flags); > if (folio->mapping) { /* Race with truncate? */ > WARN_ON_ONCE(warn && !folio_test_uptodate(folio)); > folio_account_dirtied(folio, mapping); > __xa_set_mark(&mapping->i_pages, folio_index(folio), > PAGECACHE_TAG_DIRTY); > } > xa_unlock_irqrestore(&mapping->i_pages, flags); > } > > -ritesh > > > > > Generally I don't understand why this part of the patch is needed at > > all. Wasn't the point merely to rename uptodate_* to state_* and > > introduce the iomap_iop_test_*_uptodate helpers? > > > > --D > > > >> + struct folio *folio, size_t off, size_t len) > >> +{ > >> + struct iomap_page *iop = to_iomap_page(folio); > >> + > >> + if (iop) > >> + iop_set_range_uptodate(inode, folio, off, len); > >> + else > >> + folio_mark_uptodate(folio); > >> +} > >> + > >> static struct iomap_page *iomap_iop_alloc(struct inode *inode, > >> struct folio *folio, unsigned int flags) > >> { > >> @@ -58,12 +100,12 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, > >> else > >> gfp = GFP_NOFS | __GFP_NOFAIL; > >> > >> - iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)), > >> + iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(nr_blocks)), > >> gfp); > >> if (iop) { > >> - spin_lock_init(&iop->uptodate_lock); > >> + spin_lock_init(&iop->state_lock); > >> if (folio_test_uptodate(folio)) > >> - bitmap_fill(iop->uptodate, nr_blocks); > >> + bitmap_fill(iop->state, nr_blocks); > >> folio_attach_private(folio, iop); > >> } > >> return iop; > >> @@ -72,14 +114,12 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, > >> static void iomap_iop_free(struct folio *folio) > >> { > >> struct iomap_page *iop = to_iomap_page(folio); > >> - struct inode *inode = folio->mapping->host; > >> - unsigned int nr_blocks = i_blocks_per_folio(inode, folio); > >> > >> if (!iop) > >> return; > >> WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending)); > >> WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending)); > >> - WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) != > >> + WARN_ON_ONCE(iop_test_full_uptodate(folio) != > >> folio_test_uptodate(folio)); > >> folio_detach_private(folio); > >> kfree(iop); > >> @@ -111,7 +151,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, > >> > >> /* move forward for each leading block marked uptodate */ > >> for (i = first; i <= last; i++) { > >> - if (!test_bit(i, iop->uptodate)) > >> + if (!iop_test_block_uptodate(folio, i)) > >> break; > >> *pos += block_size; > >> poff += block_size; > >> @@ -121,7 +161,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, > >> > >> /* truncate len if we find any trailing uptodate block(s) */ > >> for ( ; i <= last; i++) { > >> - if (test_bit(i, iop->uptodate)) { > >> + if (iop_test_block_uptodate(folio, i)) { > >> plen -= (last - i + 1) * block_size; > >> last = i - 1; > >> break; > >> @@ -145,30 +185,6 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, > >> *lenp = plen; > >> } > >> > >> -static void iomap_iop_set_range_uptodate(struct folio *folio, > >> - struct iomap_page *iop, size_t off, size_t len) > >> -{ > >> - struct inode *inode = folio->mapping->host; > >> - unsigned first = off >> inode->i_blkbits; > >> - unsigned last = (off + len - 1) >> inode->i_blkbits; > >> - unsigned long flags; > >> - > >> - spin_lock_irqsave(&iop->uptodate_lock, flags); > >> - bitmap_set(iop->uptodate, first, last - first + 1); > >> - if (bitmap_full(iop->uptodate, i_blocks_per_folio(inode, folio))) > >> - folio_mark_uptodate(folio); > >> - spin_unlock_irqrestore(&iop->uptodate_lock, flags); > >> -} > >> - > >> -static void iomap_set_range_uptodate(struct folio *folio, > >> - struct iomap_page *iop, size_t off, size_t len) > >> -{ > >> - if (iop) > >> - iomap_iop_set_range_uptodate(folio, iop, off, len); > >> - else > >> - folio_mark_uptodate(folio); > >> -} > >> - > >> static void iomap_finish_folio_read(struct folio *folio, size_t offset, > >> size_t len, int error) > >> { > >> @@ -178,7 +194,8 @@ static void iomap_finish_folio_read(struct folio *folio, size_t offset, > >> folio_clear_uptodate(folio); > >> folio_set_error(folio); > >> } else { > >> - iomap_set_range_uptodate(folio, iop, offset, len); > >> + iomap_iop_set_range_uptodate(folio->mapping->host, folio, > >> + offset, len); > >> } > >> > >> if (!iop || atomic_sub_and_test(len, &iop->read_bytes_pending)) > >> @@ -214,7 +231,6 @@ struct iomap_readpage_ctx { > >> static int iomap_read_inline_data(const struct iomap_iter *iter, > >> struct folio *folio) > >> { > >> - struct iomap_page *iop; > >> const struct iomap *iomap = iomap_iter_srcmap(iter); > >> size_t size = i_size_read(iter->inode) - iomap->offset; > >> size_t poff = offset_in_page(iomap->offset); > >> @@ -232,15 +248,14 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, > >> if (WARN_ON_ONCE(size > iomap->length)) > >> return -EIO; > >> if (offset > 0) > >> - iop = iomap_iop_alloc(iter->inode, folio, iter->flags); > >> - else > >> - iop = to_iomap_page(folio); > >> + iomap_iop_alloc(iter->inode, folio, iter->flags); > >> > >> addr = kmap_local_folio(folio, offset); > >> memcpy(addr, iomap->inline_data, size); > >> memset(addr + size, 0, PAGE_SIZE - poff - size); > >> kunmap_local(addr); > >> - iomap_set_range_uptodate(folio, iop, offset, PAGE_SIZE - poff); > >> + iomap_iop_set_range_uptodate(iter->inode, folio, offset, > >> + PAGE_SIZE - poff); > >> return 0; > >> } > >> > >> @@ -277,7 +292,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, > >> > >> if (iomap_block_needs_zeroing(iter, pos)) { > >> folio_zero_range(folio, poff, plen); > >> - iomap_set_range_uptodate(folio, iop, poff, plen); > >> + iomap_iop_set_range_uptodate(iter->inode, folio, poff, plen); > >> goto done; > >> } > >> > >> @@ -452,7 +467,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count) > >> last = (from + count - 1) >> inode->i_blkbits; > >> > >> for (i = first; i <= last; i++) > >> - if (!test_bit(i, iop->uptodate)) > >> + if (!iop_test_block_uptodate(folio, i)) > >> return false; > >> return true; > >> } > >> @@ -591,7 +606,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > >> if (status) > >> return status; > >> } > >> - iomap_set_range_uptodate(folio, iop, poff, plen); > >> + iomap_iop_set_range_uptodate(iter->inode, folio, poff, plen); > >> } while ((block_start += plen) < block_end); > >> > >> return 0; > >> @@ -698,7 +713,6 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, > >> static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, > >> size_t copied, struct folio *folio) > >> { > >> - struct iomap_page *iop = to_iomap_page(folio); > >> flush_dcache_folio(folio); > >> > >> /* > >> @@ -714,7 +728,8 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, > >> */ > >> if (unlikely(copied < len && !folio_test_uptodate(folio))) > >> return 0; > >> - iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len); > >> + iomap_iop_set_range_uptodate(inode, folio, offset_in_folio(folio, pos), > >> + len); > >> filemap_dirty_folio(inode->i_mapping, folio); > >> return copied; > >> } > >> @@ -1630,7 +1645,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > >> * invalid, grab a new one. > >> */ > >> for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { > >> - if (iop && !test_bit(i, iop->uptodate)) > >> + if (iop && !iop_test_block_uptodate(folio, i)) > >> continue; > >> > >> error = wpc->ops->map_blocks(wpc, inode, pos); > >> -- > >> 2.40.1 > >>