Re: ceph-disk improvements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wido, I just discussed that with Sam last week and it seems that
bcache allocates a minor of 1 when creating the device.
Sam ended up writing this:
https://yaple.net/2016/03/31/bcache-partitions-and-dkms/
The fix is not complex not sure why it is not part of bcache yet...

Not sure if it's ceph-disk's job to do all of this with bcache though...
We might need to check with the bache guys what are their plans about this.
If this will go through at some point we might just wait, if not we
could implement the partition trick on ceph-disk.

In addition to that, ceph-disk could have a more general cache option
where we could add "plugins" like bcache, dm-cache etc.
Thanks!


On Mon, Apr 4, 2016 at 3:04 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>> Op 3 april 2016 om 14:59 schreef Sage Weil <sweil@xxxxxxxxxx>:
>>
>>
>> On Sat, 2 Apr 2016, Wido den Hollander wrote:
>> > > Op 1 april 2016 om 17:36 schreef Sage Weil <sweil@xxxxxxxxxx>:
>> > > Hi all,
>> > >
>> > > There are a couple of looming features for ceph-disk:
>> > >
>> > > 1- Support for additional devices when using BlueStore.  There can be up
>> > > to three: the main device, a WAL/journal device (small, ~128MB, ideally
>> > > NVRAM), and a fast metadata device (as big as you have available; will be
>> > > used for internal metadata).
>> > >
>> > > 2- Support for setting up dm-cache, bcache, and/or FlashCache underneath
>> > > filestore or bluestore.
>> > >
>> >
>> > Keep in mind that you can't create a partition on a bcache device. So when
>> > using
>> > bcache, the journal has to be filebased and not a partition.
>>
>> Can you create a bcache device out of a partition, though?
>>
>
> Yes. If you have /dev/sdb which is a SSD and /dev/sdc which is a disk, you can
> do:
>
> /dev/sdc can be used as a caching device:
>
> $ make-bcache -C /dev/sdb
>
> Now, you can partition /dev/sdc (the HDD):
>
> $ parted /dev/sdc mklabel gpt
> $ parted /dev/sdc mkpart primary 2048s 10G
> $ parted /dev/sdc mkpart primary 10G 100%
> $ make-bcache -B /dev/sdc2
>
> Now you still have to attach /dev/bcache0 (which is /dev/sdc2) to /dev/sdb by
> echoing the UUID to /sys/block/bcache0/bcache/attach
>
> This is explained in a quick howto here:
> https://wiki.archlinux.org/index.php/Bcache
>
> So for BlueStore this would work. A small, non-bcache, parition for XFS and
> /dev/bcache0 for BlueStore directly.
>
> The question will be if you want ceph-disk to prepare bcache completely, or ask
> the user to provide a already configured device.
>
> $ ceph-disk prepare --bluestore --bcache /dev/sdc1:/dev/bcache0
>
> The first device will be the XFS partition with metadata and the second will be
> data device.
>
> Wido
>
>> sage
>>
>> > If we add the flag --file-based-journal or --no-partitions we can create
>> > OSDs on
>> > both bcache and dm-cache.
>> >
>> > With BlueStore this becomes a problem since it requires the small (XFS)
>> > filesystem for it's metadata.
>> >
>> > Wido
>> >
>> > > The current syntax of
>> > >
>> > >  ceph-disk prepare [--dmcrypt] [--bluestore] DATADEV [JOURNALDEV]
>> > >
>> > > isn't terribly expressive.  For example, the journal device size is set
>> > > via a config option, not on the command line.  For bluestore, the metadata
>> > >
>> > > device will probably want/need explicit user input so they can ensure it's
>> > >
>> > > 1/Nth of their SSD (if they have N HDDs to each SSD).
>> > >
>> > > And if we put dmcache in there, that partition will need to be sized too.
>> > >
>> > > Another consideration is that right now we don't play nice with LVM at
>> > > all.  Should we?  dm-cache is usually used in conjunction with LVM
>> > > (although it doesn't have to be).  Does LVM provide value?  Like, the
>> > > ability for users to add a second SSD to a box and migrate cache, wal, or
>> > > journal partitions around?
>> > >
>> > > I'm interested in hearing feedback on requirements, approaches, and
>> > > interfaces before we go too far down the road...
>> > >
>> > > Thanks!
>> > > sage
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cheers

––––––
Sébastien Han
Senior Cloud Architect

"Always give 100%. Unless you're giving blood."

Mail: seb@xxxxxxxxxx
Address: 11 bis, rue Roquépine - 75008 Paris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux