Re: PSA: sqlite3 databases now available for ceph-mgr modules

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Sun, 20 Jun 2021 19:30:09 -0700

On Sun, Jun 20, 2021 at 6:06 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>
>
> Not that anyone asked for my zwei pfennig ...
>
> >> but it is in turn backed by the same cluster. without
> >> carefully planning, the objects stored in .mgr pool could be mapped to
> >> the same set of OSDs and monitors which are suffering from the
> >> performance issue.
>
> In other words, a circular dependency of sorts?

Negative feedback.

> >> in the worst case, this could in turn even worsen
> >> the situation. but to allocate dedicated OSDs and create a CRUSH map
> >> picking them just for the .mgr pool might be difficult or overkill
> >> from the maintainability point of view.
>
> Certainly there are wrinkles.
>
> The first time I interacted with a devicehealth pool was because it was causing HEALTH_ERR.  I found a pool with 1 PG, with an empty acting set.  Couldn’t figure out how the heck it got that way, so I just removed the pool.
>
> Dedicated OSDs could be problematic if they decrease cluster capacity and uniformity by co-opting drive bays that otherwise would hold user data.  With LVM, maybe a small slice of each drive?  But then how would we size that slice?  Would that complicate operations for operators for whom one drive == one OSDs is deeply ingrained?
>
> Beyond maintainability and zero-sum drive bays, though, is media suitability.  With an HDD cluster, would .mgr pragmatically need to be on faster storage, a la RGW index?  If so that feeds into the above drive bay quandary.  If the only flash available in the systems is something like Optane, is there enough capacity?  I’m not super familiar with sqlite, but I wonder if the access pattern would be problematic from a drive durability standpoint too.

I would not go so far as to suggest that the .mgr pool be given dedicated OSDs.

libcephsqlite does repeatedly overwrite the same objects but I don't
believe that would cause undue wear on drives. There is no use of
omap.

> > OTOH, a module that is
> > collecting large streams of data into the database might first ingest
> > that data into a local in-memory database and only backup [1] that
> > in-memory database to RADOS when the cluster is healthy. If the
> > database is very large then a backup would not be desirable as the
> > in-memory database would be too large. In that case I would suggest
> > streaming batch updates in large transactions.
>
> For at least some of these purposes, might it be feasible to just use memstore and memstore alone?  Staging to persistent storage seems fraught with corner cases and atomicity concerns.

What corner cases / atomicity concerns? Transactions are ACID in
sqlite even when backed by RADOS.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx