Re: [PATCH 02/27] xfs: query the per-AG reservation counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 21, 2017 at 10:36:31AM -0400, Brian Foster wrote:
> On Wed, Sep 20, 2017 at 05:17:43PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Establish an ioctl for userspace to query the original and current
> > per-AG reservation counts.  This will be used by xfs_scrub to
> > check that the vfs counters are at least somewhat sane.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > ---
> 
> The code seems fine, but I'm wondering how much of this would remain as
> is based on Dave's potential switch of the perag res stuff to be a
> hidden reservation. That itself still needs to be posted/reviewed/etc.,
> so might it be smarter to defer defining an interface to export this
> stuff until the core mechanism is worked out?

I think Dave's vision is that we should permanently steal the AG
reservations from the max fs size:

(device size - static metadata - static per-ag reservations)

In which case the ioctl isn't needed at all, since the statfs counters
would simply never report the per-ag reservations at all.

> That aside, I'm also wondering how/what userspace scrub does with the
> overall total values as opposed to individual AG reservation values.

To a rough approximation, xfs_scrub (phase 7) compares the fs geometry
data (used blocks, used inodes) against what it found via getfsmap and
bulkstat and warns about a potentially insufficient scrub if the counts
are off by more than 10% from what the fs reports.  The per-AG summary
counter ar_current_resv is added to the statvfs.f_bfree counter since
reserved blocks don't show up in getfsmap.

Since per-AG reservations never use more than 9% of the fs space worst
case (1k blocks) and usually 2% (4k blocks) the current 10% tolerance
built into xfs_scrub is probably good enough for now.

Anyway, I plan to withdraw this patch.

> Are the current "ask" values expected to be uniform across each AG..?

scrub doesn't care.

> For example, what happens if one AG has a bogus "current" value for
> whatever reason (perhaps this is more relevant for both values if we
> do actually end up with on-disk reservations)?

If it exceeds scrub's tolerances it'll warn about the discrepancy.
There's probably no way to fix the summary counters without an
xfs_repair run, unless it's acceptable to freeze the whole fs.

Not going to try to do that. :)

--D

> 
> Brian
> 
> >  fs/xfs/libxfs/xfs_fs.h |   12 ++++++++++++
> >  fs/xfs/xfs_fsops.c     |   26 ++++++++++++++++++++++++++
> >  fs/xfs/xfs_fsops.h     |    2 ++
> >  fs/xfs/xfs_ioctl.c     |   24 ++++++++++++++++++++++++
> >  fs/xfs/xfs_ioctl32.c   |    1 +
> >  5 files changed, 65 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 8c61f21..2c26c38 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -469,6 +469,17 @@ typedef struct xfs_swapext
> >  #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH		0x2	/* don't flush log nor data */
> >  
> >  /*
> > + * AG reserved block counters
> > + */
> > +struct xfs_fsop_ag_resblks {
> > +	__u32 ar_flags;			/* output flags, none defined now */
> > +	__u32 ar_reserved;		/* zero */
> > +	__u64 ar_current_resv;		/* blocks reserved now */
> > +	__u64 ar_mount_resv;		/* blocks reserved at mount time */
> > +	__u64 ar_reserved2[5];		/* zero */
> > +};
> > +
> > +/*
> >   * ioctl limits
> >   */
> >  #ifdef XATTR_LIST_MAX
> > @@ -543,6 +554,7 @@ typedef struct xfs_swapext
> >  #define XFS_IOC_ATTRMULTI_BY_HANDLE  _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
> >  #define XFS_IOC_FSGEOMETRY	     _IOR ('X', 124, struct xfs_fsop_geom)
> >  #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, uint32_t)
> > +#define XFS_IOC_GET_AG_RESBLKS	     _IOR ('X', 126, struct xfs_fsop_ag_resblks)
> >  /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
> >  
> >  
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 8f22fc5..50fb3a2 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -44,6 +44,7 @@
> >  #include "xfs_filestream.h"
> >  #include "xfs_rmap.h"
> >  #include "xfs_ag_resv.h"
> > +#include "xfs_fs.h"
> >  
> >  /*
> >   * File system operations
> > @@ -1046,3 +1047,28 @@ xfs_fs_unreserve_ag_blocks(
> >  
> >  	return error;
> >  }
> > +
> > +/* Query the per-AG reservations to see how many blocks we have reserved. */
> > +int
> > +xfs_fs_get_ag_reserve_blocks(
> > +	struct xfs_mount		*mp,
> > +	struct xfs_fsop_ag_resblks	*out)
> > +{
> > +	struct xfs_ag_resv		*r;
> > +	struct xfs_perag		*pag;
> > +	xfs_agnumber_t			agno;
> > +
> > +	memset(out, 0, sizeof(struct xfs_fsop_ag_resblks));
> > +	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> > +		pag = xfs_perag_get(mp, agno);
> > +		r = xfs_perag_resv(pag, XFS_AG_RESV_METADATA);
> > +		out->ar_current_resv += r->ar_reserved;
> > +		out->ar_mount_resv += r->ar_asked;
> > +		r = xfs_perag_resv(pag, XFS_AG_RESV_AGFL);
> > +		out->ar_current_resv += r->ar_reserved;
> > +		out->ar_mount_resv += r->ar_asked;
> > +		xfs_perag_put(pag);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> > index 2954c13..c8f5e26 100644
> > --- a/fs/xfs/xfs_fsops.h
> > +++ b/fs/xfs/xfs_fsops.h
> > @@ -25,6 +25,8 @@ extern int xfs_fs_counts(xfs_mount_t *mp, xfs_fsop_counts_t *cnt);
> >  extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval,
> >  				xfs_fsop_resblks_t *outval);
> >  extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags);
> > +extern int xfs_fs_get_ag_reserve_blocks(struct xfs_mount *mp,
> > +		struct xfs_fsop_ag_resblks *out);
> >  
> >  extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
> >  extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 5049e8a..44dc178 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1782,6 +1782,27 @@ xfs_ioc_swapext(
> >  	return error;
> >  }
> >  
> > +static int
> > +xfs_ioc_get_ag_reserve_blocks(
> > +	struct xfs_mount		*mp,
> > +	void __user			*arg)
> > +{
> > +	struct xfs_fsop_ag_resblks	out;
> > +	int				error;
> > +
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +
> > +	error = xfs_fs_get_ag_reserve_blocks(mp, &out);
> > +	if (error)
> > +		return error;
> > +
> > +	if (copy_to_user(arg, &out, sizeof(out)))
> > +		return -EFAULT;
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Note: some of the ioctl's return positive numbers as a
> >   * byte count indicating success, such as readlink_by_handle.
> > @@ -1987,6 +2008,9 @@ xfs_file_ioctl(
> >  		return 0;
> >  	}
> >  
> > +	case XFS_IOC_GET_AG_RESBLKS:
> > +		return xfs_ioc_get_ag_reserve_blocks(mp, arg);
> > +
> >  	case XFS_IOC_FSGROWFSDATA: {
> >  		xfs_growfs_data_t in;
> >  
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index fa0bc4d..e8b4de3 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -556,6 +556,7 @@ xfs_file_compat_ioctl(
> >  	case XFS_IOC_ERROR_INJECTION:
> >  	case XFS_IOC_ERROR_CLEARALL:
> >  	case FS_IOC_GETFSMAP:
> > +	case XFS_IOC_GET_AG_RESBLKS:
> >  		return xfs_file_ioctl(filp, cmd, p);
> >  #ifndef BROKEN_X86_ALIGNMENT
> >  	/* These are handled fine if no alignment issues */
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux