Re: Bcache, partitions and BlueStore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 26, 2016 at 5:44 PM, Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 26 september 2016 om 17:48 schreef Sam Yaple <samuel@xxxxxxxxx>:
>
>
> On Mon, Sep 26, 2016 at 9:31 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
> > Hi,
> >
> > This has been discussed on the ML before [0], but I would like to bring
> > this up again with the outlook towards BlueStore.
> >
> > Bcache [1] allows for block device level caching in Linux. This can be
> > read/write(back) and vastly improves read and write performance to a block
> > device.
> >
> > With the current layout of Ceph with FileStore you can already use bcache,
> > but not with ceph-disk.
> >
> > The reason is that bcache currently does not support creating partitions
> > on those devices. There are patches [2] out there, but they are not
> > upstream.
> >
> > I haven't tested it yet, but it looks like BlueStore can still benefit
> > quite good from Bcache and it would be a lot easier if the patches [2] were
> > merged upstream.
> >
> > This way you would have:
> >
> > - bcache0p1: XFS/EXT4 OSD metadata
> > - bcache0p2: RocksDB
> > - bcache0p3: RocksDB WAL
> > - bcache0p4: BlueStore DATA
> >
> > With bcache you could create multiple bcache devices by creating
> > partitions on the backing disk and creating bcache devices for all of them,
> > but that's a lot of work and not easy to automate with ceph-disk.
> >
> > So what I'm trying to find is the best route to get this upstream in the
> > Linux kernel. That way next year when BlueStore becomes the default in L
> > (luminous) users can use bcache underneath BlueStore easily.
> >
> > Does anybody know the proper route we need to take to get this fixed
> > upstream? Has any contacts with the bcache developers?
> >
>
> Kent is pretty heavy into developing bcachefs at the moment. But you can
> hit him up on IRC at OFTC #bcache . I've talked ot him about this before
> and he is 100% willing to accept any patch to solves this issue in the
> standard way the kernel typically allocs major/minors for disks. The blog
> post you listed from me does _not_ solve this in an upstream way, though
> the final result is pretty accurate from my understanding.
>

No, I understood that the blog indeed doesn't solve that.

> I will look into a more better way to patch this upstream since there is
> renew interested in this.
>

That would be great! My kernel knowledge is to limited to look into this, but if you could help with this it would be nice.

If this hits the kernel somewhere in Nov/Dec we should be good for a kernel release somewhere together with L for Ceph.

> Also, checkout bcachefs if you like bcache. It's up and coming, but it is
> pretty sweet. My goal is to use bcachefs with bluestore in the future.
>

bcachefs with bluestore? The OSD doesn't require a filesystem with BlueStore, just a raw block device :)

Well there are parts of the OSD that still use a file system that can benefit from the caching (rockdb and wal). This is what I meant. There is a tiering system with bcachefs which currently only supports 2 tiers, but will eventually allow for 15 tiers, so you could have small and fast pci caching tier, followed by ssd, followed by spinning disk. Controlling what data can exist on what tier (and with writeback/writethrough potentially). Lots of room for configurations to improve performance.

SamYaple
 
Wido

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux