On Thu, Apr 04, 2019 at 07:48:57AM -0400, Brian Foster wrote: > On Wed, Apr 03, 2019 at 09:11:06AM -0700, Darrick J. Wong wrote: > > On Wed, Apr 03, 2019 at 10:30:05AM -0400, Brian Foster wrote: > > > On Mon, Apr 01, 2019 at 10:10:52AM -0700, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > > > > > Use the AG geometry info ioctl to report health status too. > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > --- > > > > fs/xfs/libxfs/xfs_fs.h | 12 +++++++++++- > > > > fs/xfs/libxfs/xfs_health.h | 2 ++ > > > > fs/xfs/xfs_health.c | 40 ++++++++++++++++++++++++++++++++++++++++ > > > > fs/xfs/xfs_ioctl.c | 2 ++ > > > > 4 files changed, 55 insertions(+), 1 deletion(-) > > > > > > > > > > > ... > > > > diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c > > > > index 151c98693bef..5ca471bd41ad 100644 > > > > --- a/fs/xfs/xfs_health.c > > > > +++ b/fs/xfs/xfs_health.c > > > > @@ -276,3 +276,43 @@ xfs_fsop_geom_health( > > > > if (sick & XFS_HEALTH_RT_SUMMARY) > > > > geo->health |= XFS_FSOP_GEOM_HEALTH_RT_SUMMARY; > > > > } > > > > + > > > > +/* Fill out ag geometry health info. */ > > > > +void > > > > +xfs_ag_geom_health( > > > > + struct xfs_mount *mp, > > > > + xfs_agnumber_t agno, > > > > + struct xfs_ag_geometry *ageo) > > > > +{ > > > > + struct xfs_perag *pag; > > > > + unsigned int sick; > > > > + > > > > + if (agno >= mp->m_sb.sb_agcount) > > > > + return; > > > > > > The call to xfs_ag_get_geometry() would have already returned an error > > > in the ioctl path for the above scenario. It might still make sense to > > > check here, but perhaps we could let this function also return an int > > > and return an error for consistency? > > > > Or maybe just ASSERT on the agno and add a note that the caller is > > required to pass in a valid ag number. > > > > > > + > > > > + ageo->ag_health = 0; > > > > + > > > > + pag = xfs_perag_get(mp, agno); > > > > + sick = xfs_ag_measure_sickness(pag); > > > > + if (sick & XFS_HEALTH_AG_SB) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_SB; > > > > > > I'm starting to wonder whether "health" is the best term to use for the > > > interface bits just because it reads a little weird to measure > > > "sickness" and then apply all the sick state to something called > > > "health." I don't have a better suggestion off the top of my head, > > > though. Just something to think about a bit more from an API > > > standpoint.. > > > > I had the same conundrum. I guess we could start the bitset with -1 and > > clear bits when scrub says they've gone bad? That would be much clearer > > with regards to the names, but technically we don't know the health of a > > structure until we scan it, so I wouldn't want to represent the fs as > > being "healthy" having not actually looked for problems. > > > > What we /really/ need is a tri-state bitset[1]: > > > > enum Bool > > { > > True, > > False, > > FileNotFound > > }; > > > > But maybe I will try renaming all this to "sick" again. > > > > if (sick & XFS_SICK_AG_AGF) > > ageo->ag_sick |= XFS_AG_GEOM_SICK_AG_AGF; > > > > Gosh. That second name is gross. XFS_AG_GEOM_SICK_AGF. > > > > Sick sick sick sick sick. Ok, I've convinced myself of the name change. :P > > > > Heh. I suppose we could either invert the logic or perhaps try to come > up with a better keyword than "health" for the exported bits (at least). > If I see ag_health in a data structure, for example, I'm assuming it's > telling me what is healthy. Of course we'll have documentation and > whatnot to clear that up.. > > Another term that came to mind is "fault" or "faulted" as it has > precedent in storage contexts wrt to raid. I.e., ag_faults and > XFS_AG_GEOM_FAULT_AGF, etc. etc. To me it also kind of covers the angle > that we aren't necessarily stating a subset of the filesystem is healthy > due to lack of faults if we just haven't scrubbed/found anything. Hm? I > guess it could be confused with reporting underlying storage problems. I > dunno... it's more clear to me, but maybe others have ideas.. I have a (not very strong) preference for 'sick' over 'fault' because there are other parts of xfs where we deal with (page) faults and I don't really want to get "file metadata faults" and "file page faults" confused. (I'm not sure anyone is really going to confuse them, though...) --D > Brian > > > --D > > > > [1] https://thedailywtf.com/articles/What_Is_Truth_0x3f_ > > > > > Brian > > > > > > > + if (sick & XFS_HEALTH_AG_AGF) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGF; > > > > + if (sick & XFS_HEALTH_AG_AGFL) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGFL; > > > > + if (sick & XFS_HEALTH_AG_AGI) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_AGI; > > > > + if (sick & XFS_HEALTH_AG_BNOBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_BNOBT; > > > > + if (sick & XFS_HEALTH_AG_CNTBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_CNTBT; > > > > + if (sick & XFS_HEALTH_AG_INOBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_INOBT; > > > > + if (sick & XFS_HEALTH_AG_FINOBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_FINOBT; > > > > + if (sick & XFS_HEALTH_AG_RMAPBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_RMAPBT; > > > > + if (sick & XFS_HEALTH_AG_REFCNTBT) > > > > + ageo->ag_health |= XFS_AG_GEOM_HEALTH_AG_REFCNTBT; > > > > + xfs_perag_put(pag); > > > > +} > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c > > > > index f9bf11b6a055..f1fc5e53cfc1 100644 > > > > --- a/fs/xfs/xfs_ioctl.c > > > > +++ b/fs/xfs/xfs_ioctl.c > > > > @@ -853,6 +853,8 @@ xfs_ioc_ag_geometry( > > > > if (error) > > > > return error; > > > > > > > > + xfs_ag_geom_health(mp, ageo.ag_number, &ageo); > > > > + > > > > if (copy_to_user(arg, &ageo, sizeof(ageo))) > > > > return -EFAULT; > > > > return 0; > > > >