Re: [PATCH 1/7] xfs: create a big array data structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 28, 2023 at 04:10:51AM +0100, Matthew Wilcox wrote:
> On Thu, Jul 27, 2023 at 03:25:35PM -0700, Darrick J. Wong wrote:
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index 7418d6c60056a..0b9e781840f37 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -16,6 +16,9 @@
> >  #include <linux/tracepoint.h>
> >  #include "xfs_bit.h"
> >  
> > +struct xfile;
> > +struct xfarray;
> 
> You dereference both a struct xfile and a struct xfarray.  Seems like
> you don't need these declarations?

I suppose not.

> > +/* Compute array index given an xfile offset. */
> > +static xfarray_idx_t
> > +xfarray_idx(
> > +	struct xfarray	*array,
> > +	loff_t		pos)
> > +{
> > +	if (array->obj_size_log >= 0)
> > +		return (xfarray_idx_t)pos >> array->obj_size_log;
> > +
> > +	return div_u64((xfarray_idx_t)pos, array->obj_size);
> 
> If xfarray_idx_t is smaller than an loff_t, this will truncate pos,
> which isn't what you want.

typedef uint64_t              xfarray_idx_t;

This won't be smaller than loff_t until you port Linux to 128-bit
integers in 2028.

> > +/* Compute xfile offset of array element. */
> > +static inline loff_t xfarray_pos(struct xfarray *array, xfarray_idx_t idx)
> > +{
> > +	if (array->obj_size_log >= 0)
> > +		return idx << array->obj_size_log;
> > +
> > +	return idx * array->obj_size;
> 
> Likewise, you need to promote idx to loff_t before shifting/multiplying.
> 
> > +static inline bool
> > +xfarray_is_unset(
> > +	struct xfarray	*array,
> > +	loff_t		pos)
> > +{
> > +	void		*temp = xfarray_scratch(array);
> > +	int		error;
> > +
> > +	if (array->unset_slots == 0)
> > +		return false;
> > +
> > +	error = xfile_obj_load(array->xfile, temp, array->obj_size, pos);
> > +	if (!error && xfarray_element_is_null(array, temp))
> > +		return true;
> > +
> > +	return false;
> 
> Wouldn't this be clearer as:
> 
> 	return !error && xfarray_element_is_null(array, temp);

<shrug> don't care either way.

	if (error)
		return false;
	return xfarray_element_is_null(...);

> > +int
> > +xfarray_store_anywhere(
> > +	struct xfarray	*array,
> > +	const void	*ptr)
> > +{
> > +	void		*temp = xfarray_scratch(array);
> > +	loff_t		endpos = xfarray_pos(array, array->nr);
> > +	loff_t		pos;
> > +	int		error;
> > +
> > +	/* Find an unset slot to put it in. */
> > +	for (pos = 0;
> > +	     pos < endpos && array->unset_slots > 0;
> > +	     pos += array->obj_size) {
> > +		error = xfile_obj_load(array->xfile, temp, array->obj_size,
> > +				pos);
> > +		if (error || !xfarray_element_is_null(array, temp))
> > +			continue;
> > +
> > +		error = xfile_obj_store(array->xfile, ptr, array->obj_size,
> > +				pos);
> > +		if (error)
> > +			return error;
> > +
> > +		array->unset_slots--;
> > +		return 0;
> > +	}
> 
> ... how often is this called?  This seems like it might be slow.

It's used in the refcount btree rebuilder patch, when it's trying to
stack rmaps to compute the refcount of a given extent from the number of
rmaps it's collected for that extent.

(Eventually I replace the xfarray with an indexed btree to eliminate the
linear searching, but that won't happen until the part 2 of part 1
because I decided to send only the first 51 of 209 patches.)

> > +	/*
> > +	 * Call SEEK_DATA on the last byte in the record we're about to read.
> > +	 * If the record ends at (or crosses) the end of a page then we know
> > +	 * that the first byte of the record is backed by pages and don't need
> > +	 * to query it.  If instead the record begins at the start of the page
> > +	 * then we know that querying the last byte is just as good as querying
> > +	 * the first byte, since records cannot be larger than a page.
> > +	 *
> > +	 * If the call returns the same file offset, we know this record is
> > +	 * backed by real pages.  We do not need to move the cursor.
> > +	 */
> 
> Clever.
> 
> > +ssize_t
> > +xfile_pread(
> > +	struct xfile		*xf,
> > +	void			*buf,
> > +	size_t			count,
> > +	loff_t			pos)
> > +{
> > +	struct inode		*inode = file_inode(xf->file);
> > +	struct address_space	*mapping = inode->i_mapping;
> > +	struct page		*page = NULL;
> > +	ssize_t			read = 0;
> > +	unsigned int		pflags;
> > +	int			error = 0;
> > +
> > +	if (count > MAX_RW_COUNT)
> > +		return -E2BIG;
> > +	if (inode->i_sb->s_maxbytes - pos < count)
> > +		return -EFBIG;
> > +
> > +	trace_xfile_pread(xf, pos, count);
> > +
> > +	pflags = memalloc_nofs_save();
> 
> Should we be calling this here, or should this be done by the caller?
> Presumably it's the current caller that can't stand reclaim starting?

Well... here's the thing -- scrub already does this by attaching a
(sometimes empty) transaction to the scrub context.  In the context of
"xfile as an xscrub infrastructure", it's unnecessary.

OTOH in the context of "xfile as something that may some day end up a
general kernel tool", I don't think we want an xfile access to recurse
into filesystems.

> > +	while (count > 0) {
> > +		void		*p, *kaddr;
> > +		unsigned int	len;
> > +
> > +		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> > +
> > +		/*
> > +		 * In-kernel reads of a shmem file cause it to allocate a page
> > +		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
> > +		 * we can continue by zeroing the caller's buffer.
> > +		 */
> > +		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
> > +				__GFP_NOWARN);
> 
> I'm kind of hoping to transition to shmem_read_folio_gfp(), but that
> doesn't have to happen before this gets merged.

<nod> I haven't figured out if we care about large folios for xfiles
yet.  Scrub data is supposed to be ephemeral so it likely won't care,
but I can imagine longer term uses for xfiles that might actually have
an opinion.

> > +ssize_t
> > +xfile_pwrite(
> > +	struct xfile		*xf,
> > +	const void		*buf,
> > +	size_t			count,
> > +	loff_t			pos)
> > +{
> > +	struct inode		*inode = file_inode(xf->file);
> > +	struct address_space	*mapping = inode->i_mapping;
> 
> I wonder if this shoudn't be xf->file->f_mapping?

<shrug> What's the difference for a tmpfs file?

--D



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux