Re: bluestore fsck behavior with legacy stats etc

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 10 Sep 2019 21:18:20 +0000 (UTC)

On Thu, 8 Aug 2019, Anthony D'Atri wrote:
> How expensive are both of these conversions?  If execution takes say a 
> few seconds per OSD, conversion during upgrade is more feasible than if 
> they take a few minutes (or longer!) in which case clusters may need to 
> run indefinitely without updating OSDs.

The legacy statfs update is pretty quick.  Igore has a PR open to do 
the update on mount here

	https://github.com/ceph/ceph/pull/30264

which took ~20s for ~5M objects in his test.

The newer change to per-pool omap is slower because it require rewriting 
all omap data.  How slow depends on how many omap objects there are.. 
maybe almost none for an RBD pool, may lots for an RGW or CephFS metadata 
pool.

sage

> 
> 
> 
> > Background: In nautilus, bluestore started maintaining usage stats on a 
> > per-pool basis.  BlueStore OSDs created before nautilus lack these stats. 
> > Running a ceph-bluestore-tool repair can calculate the usage so that 
> > the OSD can maintain and report them going forward.
> > 
> > There are two options:
> > 
> > - bluestore_warn_on_legacy_statfs (bool, default: true), which makes the 
> > cluster issue a health warning when there are OSDs that have legacy stats.
> > 
> > - bluestore_no_per_pool_stats_tolerance (enum enforce, until_fsck, 
> > until_repair, default: until_repair).
> > 
> >  'until_fsck' will tolerate the legacy but fsck will fail
> >  'until_repair' will tolerate the legacy but fsck will pass
> >  'enforce' will tolerate the legacy but disable the warning
> > 
> > 
> > The octopus addition of per-pool omap usage tracking presents an identical 
> > problem: a new tracking ability in bluestore that reqires a conversion to 
> > enable after upgrade.
> > 
> > I think that we can simplify these settings and make them less confusing, 
> > still with two options:
> > 
> > - bluestore_fsck_error_on_no_per_pool_omap (bool, default: false). During 
> > fsck, we can either generate a 'warning' about non-per-pool omap, or an 
> > error.  Generate a warning by default, which means that the fsck return 
> > code can indicate success.
> > 
> > - bluestore_warn_on_no_per_pool_omap (bool, default: true). At runtime, we 
> > can generate a health warning if the OSD is using the legacy non-per-pool 
> > omap.
> > 
> > The overall default behavior is the same as we have with the 
> > legacy_statfs: OSDs still work, fsck passes, and we generate a health 
> > warning.
> > 
> > Setting bluestore_warn_on_no_per_pool_omap=false is the same, AFAICS, as 
> > setting bluestore_no_per_pool_stats_tolerance=enforce.  (Except maybe 
> > repair won't do the conversion? I don't see why we'd ever not want to 
> > do the conversion, though.)
> > 
> > Setting bluestore_fsck_error_on_no_per_pool_omap=true is the same, AFAICS, 
> > as bluestore_no_per_pool_stats_tolerance=until_fsck.
> > 
> > Overall, this seems simpler and easier for a user to understand.  
> > Realistically, the only option I expect a user will ever change is 
> > bluestore_warn_on_no_per_pool_omap=false to make the health warning go 
> > away after an upgrade.
> > 
> > What do you think?  Should I convert the legacy_statfs to behave the same 
> > way?
> > 
> > sage
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
> 
> 
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx