Re: mon switch from leveldb to rocksdb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 3 May 2016, Shinobu Kinjo wrote:
> If possible, it would be much better to make it pluggable so that we
> select what we want.

Yeah, that is the plan.  The mon_keyvaluedb will select leveldb or 
rocksdb.  We'd just switch the default over at some point, once we're 
satisfied with stability.

After thinking about this some more I agree with Wido that the conversion 
isn't useful enough to bother with.  We can just make new mons use 
rocksdb, and if someone wants to convert, they can add/remove/replace mons 
in their cluster to get there.

sage



> 
> On Tue, May 3, 2016 at 6:25 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> >
> >> Op 2 mei 2016 om 20:49 schreef Sage Weil <sweil@xxxxxxxxxx>:
> >>
> >>
> >> We're thinking about switching the default backend on the mon from leveldb
> >> to rocksdb.  Rocksdb is better maintained, has a stronger feature set, is
> >> generally faster, and is linked statically, which means we won't be
> >> vulnerable to buggy distro packages.
> >>
> >> There is one blocker, though.  Some distro leveldbs name the sst files
> >> with the .ldb suffix.  (Some don't; very annoying.)  There is a unit test
> >> in rocksdb that tries to verify that ldb is silently renamed to sst,
> >> and it passes, but the test is incomplete: the test failes to verify
> >> that ldb/sst files can actually be read, and it turns out only the 'check'
> >> path (not the normal open and read it path) handles ldb properly.
> >>
> >> Anyway, once that works, rocksdb will magically upgrade from leveldb to
> >> rocksdb.  Note that once that happens you can't switch from rocksdb back
> >> to leveldb without recreating the mon.
> >>
> >> Alternatively, we could not worry about upgrading existing leveldb
> >> instances and just make newly created mons default to rocksdb.
> >>
> >> 1) Thoughts on moving to rocksdb in general?
> >>
> >> 2) Importance of leveldb->rocksdb conversion?
> >>
> >
> > I would not touch this auto conversion at first. I know there is things to gain, but is it enough to gain that it might be worth while potentially corrupting monitors?
> >
> > Is it that LevelDB doesn't handle large cluster load for example? Imho the majority of Ceph clusters is still far below 500 OSDs.
> >
> > Personally I always try to stay away from touching the MONs datastore. Always feels a bit scary.
> >
> > Wido
> >
> >> 3) Anyone want to fix the ldb handling in rocksdb?
> >>
> >> Thanks!
> >> sage
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Email:
> shinobu@xxxxxxxxx
> GitHub:
> shinobu-x
> Blog:
> Life with Distributed Computational System based on OpenSource
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux