> Op 26 september 2016 om 19:51 schreef Sam Yaple <samuel@xxxxxxxxx>: > > > On Mon, Sep 26, 2016 at 5:44 PM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > > > > Op 26 september 2016 om 17:48 schreef Sam Yaple <samuel@xxxxxxxxx>: > > > > > > > > > On Mon, Sep 26, 2016 at 9:31 AM, Wido den Hollander <wido@xxxxxxxx> > > wrote: > > > > > > > Hi, > > > > > > > > This has been discussed on the ML before [0], but I would like to bring > > > > this up again with the outlook towards BlueStore. > > > > > > > > Bcache [1] allows for block device level caching in Linux. This can be > > > > read/write(back) and vastly improves read and write performance to a > > block > > > > device. > > > > > > > > With the current layout of Ceph with FileStore you can already use > > bcache, > > > > but not with ceph-disk. > > > > > > > > The reason is that bcache currently does not support creating > > partitions > > > > on those devices. There are patches [2] out there, but they are not > > > > upstream. > > > > > > > > I haven't tested it yet, but it looks like BlueStore can still benefit > > > > quite good from Bcache and it would be a lot easier if the patches [2] > > were > > > > merged upstream. > > > > > > > > This way you would have: > > > > > > > > - bcache0p1: XFS/EXT4 OSD metadata > > > > - bcache0p2: RocksDB > > > > - bcache0p3: RocksDB WAL > > > > - bcache0p4: BlueStore DATA > > > > > > > > With bcache you could create multiple bcache devices by creating > > > > partitions on the backing disk and creating bcache devices for all of > > them, > > > > but that's a lot of work and not easy to automate with ceph-disk. > > > > > > > > So what I'm trying to find is the best route to get this upstream in > > the > > > > Linux kernel. That way next year when BlueStore becomes the default in > > L > > > > (luminous) users can use bcache underneath BlueStore easily. > > > > > > > > Does anybody know the proper route we need to take to get this fixed > > > > upstream? Has any contacts with the bcache developers? > > > > > > > > > > Kent is pretty heavy into developing bcachefs at the moment. But you can > > > hit him up on IRC at OFTC #bcache . I've talked ot him about this before > > > and he is 100% willing to accept any patch to solves this issue in the > > > standard way the kernel typically allocs major/minors for disks. The blog > > > post you listed from me does _not_ solve this in an upstream way, though > > > the final result is pretty accurate from my understanding. > > > > > > > No, I understood that the blog indeed doesn't solve that. > > > > > I will look into a more better way to patch this upstream since there is > > > renew interested in this. > > > > > > > That would be great! My kernel knowledge is to limited to look into this, > > but if you could help with this it would be nice. > > > > If this hits the kernel somewhere in Nov/Dec we should be good for a > > kernel release somewhere together with L for Ceph. > > > > > Also, checkout bcachefs if you like bcache. It's up and coming, but it is > > > pretty sweet. My goal is to use bcachefs with bluestore in the future. > > > > > > > bcachefs with bluestore? The OSD doesn't require a filesystem with > > BlueStore, just a raw block device :) > > > > Well there are parts of the OSD that still use a file system that can > benefit from the caching (rockdb and wal). This is what I meant. There is a > tiering system with bcachefs which currently only supports 2 tiers, but > will eventually allow for 15 tiers, so you could have small and fast pci > caching tier, followed by ssd, followed by spinning disk. Controlling what > data can exist on what tier (and with writeback/writethrough potentially). > Lots of room for configurations to improve performance. > Interesting! Although RocksDB and it's WAL can also be a partition which would be bcache again. However, I send a message to the linux-bcache mailinglist [0], hope we can get a proper patch into the kernel soon. Any input, help or suggestions there would be nice! Wido [0]: https://marc.info/?l=linux-bcache&m=147507062812270 > SamYaple > > > > Wido > > > > > > > > > > > > > Thanks! > > > > > > > > Wido > > > > > > > > [0]: http://www.spinics.net/lists/ceph-devel/msg29550.html > > > > [1]: https://bcache.evilpiepirate.org/ > > > > [2]: https://yaple.net/2016/03/31/bcache-partitions-and-dkms/ > > > > > > > > > > > > > SamYaple > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com