Re: PSA: sqlite3 databases now available for ceph-mgr modules

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Fri, 18 Jun 2021 08:52:53 -0700

Hi Kefu,

On Thu, Jun 17, 2021 at 9:24 PM kefu chai <tchaikov@xxxxxxxxx> wrote:
>
> On Wed, Jun 16, 2021 at 10:23 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
> >
> > Introduced by [1] for Quincy release. This builds on work in [2] to
> > add RADOS-backed sqlite3 support to Ceph (available in Pacific).
> >
> > The MgrModule API for accessing your module's database is introduced
> > in [3]. An example of a module ("devicehealth") using the API can be
> > seen in [4].
> >
> > Please let me know if you have any questions or feedback.
>
>
> Hi Patrick,
>
> my concern is that, without carefully planning on the segmentation of
> the pool for storing the healthy data and the pools being monitored,
> we could interfere with the system being monitored by mutating its
> status.
>
> for instance, if a cluster is experiencing large-scale slow ops, and
> pumping lots of warning messages and/or structured performance related
> metrics, some mgr module might want to collect this information from
> the health monitoring subsystem, and persist them into the sqlite3
> database. but it is in turn backed by the same cluster. without
> carefully planning, the objects stored in .mgr pool could be mapped to
> the same set of OSDs and monitors which are suffering from the
> performance issue. in the worst case, this could in turn even worsen
> the situation. but to allocate dedicated OSDs and create a CRUSH map
> picking them just for the .mgr pool might be difficult or overkill
> from the maintainability point of view.
>
> we actually had the same issue when adding the cluster log back to OSD
> for recording the slow requests. the large amount of clog puts more
> burden on the shoulder of the monitors. if the slow requests is caused
> by monitor, these clogs actually in turn slow down the monitors
> further.
>
> shall we switch to a (local) backup sqlite backend if we identify a
> performance issue, and restore / backfill the records once the issue
> is resolved?

Thanks for bringing this up. I think it would be reasonable to decide
this depending on what the mgr module is doing. For example, I think
devicehealth and snap_schedule are innocuous enough that we don't need
to give special consideration for the system potentially being under
load. Also these modules' mutations of the databases do not depend on
the cluster state, healthy or degraded. OTOH, a module that is
collecting large streams of data into the database might first ingest
that data into a local in-memory database and only backup [1] that
in-memory database to RADOS when the cluster is healthy. If the
database is very large then a backup would not be desirable as the
in-memory database would be too large. In that case I would suggest
streaming batch updates in large transactions.

What do you think?

[1] https://www.sqlite.org/backup.html

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx