On Mon 26-10-20 04:14:00, Matthew Wilcox (Oracle) wrote: > Rewrite shmem_seek_hole_data() and move it to filemap.c. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > Reviewed-by: William Kucharski <william.kucharski@xxxxxxxxxx> > --- > include/linux/pagemap.h | 2 ++ > mm/filemap.c | 76 +++++++++++++++++++++++++++++++++++++++++ > mm/shmem.c | 72 +++----------------------------------- > 3 files changed, 82 insertions(+), 68 deletions(-) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index c77b7c31b2e4..5f3e829c91fd 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -760,6 +760,8 @@ extern void __delete_from_page_cache(struct page *page, void *shadow); > int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask); > void delete_from_page_cache_batch(struct address_space *mapping, > struct pagevec *pvec); > +loff_t mapping_seek_hole_data(struct address_space *, loff_t start, loff_t end, > + int whence); > > /* > * Like add_to_page_cache_locked, but used to add newly allocated pages: > diff --git a/mm/filemap.c b/mm/filemap.c > index 00eaed59e797..3a55d258d9f2 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -2526,6 +2526,82 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) > } > EXPORT_SYMBOL(generic_file_read_iter); > > +static inline loff_t page_seek_hole_data(struct page *page, > + loff_t start, loff_t end, bool seek_data) > +{ > + if (xa_is_value(page) || PageUptodate(page)) Please add a comment here that this is currently tmpfs specific treating exceptional entries as swapped out pages and thus data. It took me quite a while to figure this out. You can remove the comment later when it is no longer true... > + return seek_data ? start : end; > + return seek_data ? end : start; > +} > + > +static inline > +unsigned int seek_page_size(struct xa_state *xas, struct page *page) > +{ > + if (xa_is_value(page)) > + return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index); > + return thp_size(page); > +} > + > +/** > + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache. > + * @mapping: Address space to search. > + * @start: First byte to consider. > + * @end: Limit of search (exclusive). > + * @whence: Either SEEK_HOLE or SEEK_DATA. > + * > + * If the page cache knows which blocks contain holes and which blocks > + * contain data, your filesystem can use this function to implement > + * SEEK_HOLE and SEEK_DATA. This is useful for filesystems which are > + * entirely memory-based such as tmpfs, and filesystems which support > + * unwritten extents. > + * > + * Return: The requested offset on successs, or -ENXIO if @whence specifies > + * SEEK_DATA and there is no data after @start. There is an implicit hole > + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start > + * and @end contain data. > + */ > +loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start, > + loff_t end, int whence) > +{ > + XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT); > + pgoff_t max = (end - 1) / PAGE_SIZE; > + bool seek_data = (whence == SEEK_DATA); > + struct page *page; > + > + if (end <= start) > + return -ENXIO; > + > + rcu_read_lock(); > + while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) { > + loff_t pos = xas.xa_index * PAGE_SIZE; > + > + if (start < pos) { > + if (!seek_data) > + goto unlock; > + start = pos; > + } > + > + pos += seek_page_size(&xas, page); > + start = page_seek_hole_data(page, start, pos, seek_data); > + if (start < pos) > + goto unlock; Uh, I was staring at this function for half an hour but I still couldn't convince myself that it is correct in all the corner cases. Maybe I'm dumb but I'd wish this was more intuitive (and I have to say that the original tmpfs function is much more obviously correct to me). It would more understandable for me if we had a code like: if (page_seek_match(page, seek_data)) goto unlock; which would be just the condition in page_seek_hole_data(). Honestly at the moment I fail to see why you bother with 'pos' in the above four lines at all. BTW I suspect that this loop forgets to release the page reference it has got when doing SEEK_HOLE. > + } > + rcu_read_unlock(); > + > + if (seek_data) > + return -ENXIO; > + goto out; > + > +unlock: > + rcu_read_unlock(); > + if (!xa_is_value(page)) > + put_page(page); > +out: > + if (start > end) > + return end; > + return start; > +} Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR