On Thu, Oct 11, 2018 at 7:11 AM Joao Eduardo Luis <joao@xxxxxxx> wrote: > > On 10/11/2018 02:58 PM, Sage Weil wrote: > > > > I'm worried the above it a lot of complexity and opportunity for bugs > > (and work to implement) for not a lot of gain. What if we instead make > > ceph-monstore-tool have a 'convert' function that will do a conversion > > offline? The admin can take each mon down in turn, convert it, and bring > > it back up. Provisioning tools could automate this process. > > > > This will require ~2x the disk space for the conversion. OTOH, if space > > is tight, the user can also just blow away the mon entirely and create > > it, and let the normal sync bring it back into quorum... > > The problem with both approaches is that, during this period, the quorum > is degraded. > > We can argue that the way to prevent that is to add a new monitor, let > it sync, and then remove an old mon, but we may not have spare hardware > to make this work. > > I do agree that this would be a complex solution for something that > would be used 3, maybe 5 times in the lifespan of a cluster< but this is > also the sort of thing that shouldn't make the user jump through hoops > to accomplish. Seems to me that if your cluster is in that much danger from a degraded mon cluster, you've designed your mon cluster failure tolerances badly? I guess we should be more explicit about goals here though: I've just realized that while so far rocksdb conversions are not mandatory, we may want to drop support for leveldb in a future release? In which case some kind of automated conversion is definitely a higher priority — in the past, the only users of functionality like this have been those with clusters old enough to be on leveldb and sufficiently large-scale enough that rocksdb is necessary to handle the compaction stress. -Greg