On Thu, Oct 11, 2018 at 2:48 PM Sage Weil <sweil@xxxxxxxxxx> wrote: > > On Thu, 11 Oct 2018, Joao Eduardo Luis wrote: > > On 10/11/2018 07:00 PM, Gregory Farnum wrote: > > > On Thu, Oct 11, 2018 at 7:11 AM Joao Eduardo Luis <joao@xxxxxxx> wrote: > > >> > > >> On 10/11/2018 02:58 PM, Sage Weil wrote: > > >>> > > >>> I'm worried the above it a lot of complexity and opportunity for bugs > > >>> (and work to implement) for not a lot of gain. What if we instead make > > >>> ceph-monstore-tool have a 'convert' function that will do a conversion > > >>> offline? The admin can take each mon down in turn, convert it, and bring > > >>> it back up. Provisioning tools could automate this process. > > >>> > > >>> This will require ~2x the disk space for the conversion. OTOH, if space > > >>> is tight, the user can also just blow away the mon entirely and create > > >>> it, and let the normal sync bring it back into quorum... > > >> > > >> The problem with both approaches is that, during this period, the quorum > > >> is degraded. > > >> > > >> We can argue that the way to prevent that is to add a new monitor, let > > >> it sync, and then remove an old mon, but we may not have spare hardware > > >> to make this work. > > >> > > >> I do agree that this would be a complex solution for something that > > >> would be used 3, maybe 5 times in the lifespan of a cluster< but this is > > >> also the sort of thing that shouldn't make the user jump through hoops > > >> to accomplish. > > > > > > Seems to me that if your cluster is in that much danger from a > > > degraded mon cluster, you've designed your mon cluster failure > > > tolerances badly? > > > > Well, I don't think it's that uncommon for clusters to be running with 3 > > monitors. Drop one for offline conversion, and we can't tolerate a > > single failure without loss of quorum. It only takes chance for > > something this trivial to become a bad day for someone. > > FWIW the offline conversion should only take a minute or two, even for > large clusters and fat mons... much less in most cases. Even if they do > have a second mon failure I don't think it would lead to a significant > outage. > > My vote is for simple! I'm certainly in favor of simple as well, assuming it's fast enough! Trying to test the possible issues of an online conversion for a one-time operation just seems like a no-go when offline is so fast and reasonable a thing. -Greg