Re: mon upgrades and leveldb->rocksdb conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 11, 2018 at 2:48 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>
> On Thu, 11 Oct 2018, Joao Eduardo Luis wrote:
> > On 10/11/2018 07:00 PM, Gregory Farnum wrote:
> > > On Thu, Oct 11, 2018 at 7:11 AM Joao Eduardo Luis <joao@xxxxxxx> wrote:
> > >>
> > >> On 10/11/2018 02:58 PM, Sage Weil wrote:
> > >>>
> > >>> I'm worried the above it a lot of complexity and opportunity for bugs
> > >>> (and work to implement) for not a lot of gain.  What if we instead make
> > >>> ceph-monstore-tool have a 'convert' function that will do a conversion
> > >>> offline?  The admin can take each mon down in turn, convert it, and bring
> > >>> it back up.  Provisioning tools could automate this process.
> > >>>
> > >>> This will require ~2x the disk space for the conversion.  OTOH, if space
> > >>> is tight, the user can also just blow away the mon entirely and create
> > >>> it, and let the normal sync bring it back into quorum...
> > >>
> > >> The problem with both approaches is that, during this period, the quorum
> > >> is degraded.
> > >>
> > >> We can argue that the way to prevent that is to add a new monitor, let
> > >> it sync, and then remove an old mon, but we may not have spare hardware
> > >> to make this work.
> > >>
> > >> I do agree that this would be a complex solution for something that
> > >> would be used 3, maybe 5 times in the lifespan of a cluster< but this is
> > >> also the sort of thing that shouldn't make the user jump through hoops
> > >> to accomplish.
> > >
> > > Seems to me that if your cluster is in that much danger from a
> > > degraded mon cluster, you've designed your mon cluster failure
> > > tolerances badly?
> >
> > Well, I don't think it's that uncommon for clusters to be running with 3
> > monitors. Drop one for offline conversion, and we can't tolerate a
> > single failure without loss of quorum. It only takes chance for
> > something this trivial to become a bad day for someone.
>
> FWIW the offline conversion should only take a minute or two, even for
> large clusters and fat mons... much less in most cases.  Even if they do
> have a second mon failure I don't think it would lead to a significant
> outage.
>
> My vote is for simple!

I'm certainly in favor of simple as well, assuming it's fast enough!
Trying to test the possible issues of an online conversion for a
one-time operation just seems like a no-go when offline is so fast and
reasonable a thing.
-Greg



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux