Re: some newbie questions...

Johannes Klarenbeek <Johannes.Klarenbeek@xxxxxxx> · Tue, 20 Aug 2013 13:18:32 +0000

Title: Re:  some newbie questions...

Van: Wolfgang Hennerbichler [mailto:wogri@xxxxxxxxx]

Verzonden: dinsdag 20 augustus 2013 10:51

Aan: Johannes Klarenbeek

CC: ceph-users@xxxxxxxxxxxxxx

Onderwerp: Re:  some newbie questions...

On Aug 20, 2013, at 09:54 , Johannes Klarenbeek <Johannes.Klarenbeek@xxxxxxx> wrote:

> dear ceph-users,

> 

> although heavily active in the past, i didn’t touch linux for years, so I’m pretty new to ceph and i have a few questions, which i hope someone could answer for me.

> 

> 1) i read somewhere that it is recommended to have one OSD per disk in a production environment.

>    is this also the maximum disk per OSD or could i use multiple disks per OSD? and why?

you could use multiple disks for one OSD if you used some striping and abstract the disk (like LVM, MDRAID, etc). But it wouldn't make sense. One OSD writes into one filesystem, that is usually one disk in a production environment. Using RAID under it wouldn't
 increase neither reliability nor performance drastically.
Ok thats cleared out! Are you also saying that with a pure ceph machine  i should not install LVM either, since that is unnecessary
 overhead? 

> 2) i've read about some use-cases where the cluster consisted of a monitor and some osd's but no mds.

>    is that possible? i believe mds is used as a floating file system for inode storage at the osd cluster nodes.

>    so without it, could it be used for object storage of some kind?

RADOS is an object store, which needs MON's and OSD's to work. If you want to 'connect' those objects to behave like a disk drive, you would use RBD, still you only need MON's and OSD's.

If you want to expose the object store through HTTP you need radosgw additionally.

If you want a distributed filesystem on top of the object store you would configure and run the MDS, too.
Yes I understand, but if i would like to run cephfs only do I need radosgw? My guess is no.

> 3) in a san configuration where i only expose iscsi targets to the "outside" world, do i need radosgw, mds or cephfs?

no. Great!

>    I prefer some sort of plug-in that exposes iscsi targets directly on the rbd interface.
But then again, how would you

>    manage these virtual disks without cephfs…

FS = filesystem.

RBD = Block Device. You manage these virtual disks with either iscsitgtd (I hope the name is correct) which as rbd support builtin, or you map the RBD to expose to the system like a disk drive (e. g. /dev/rbd/mypool/mydisk)
Will look it up thank you
J is that load balanced by any chance? So I mean, if I set up 2 client machines both
 with iscsitgtd and nothing else, can I configure them both to expose the same target (for shared storage purposes)?

> 4) Since we like to think green too, is it possible to shutdown nodes?

>    or at least set them at a sort of standby mode after office hours?

You could turn off the whole cluster. Turning off single nodes would result in ceph rebalancing the data, and this would not be wise.
Hmmm, so rdb doesn’t have any sleep command or something, in order to let the network know it is bed time. Thinking about that, I’m
 not so sure if its possible to turn of a whole cluster. In the fraction of a second that the other nodes are still up the cluster could already trigger ceph rebalancing. I guess there is no plausible way (yet) for a green mode then.

> 5) many of the production deployments out there use xfs as their base file system, however without journaling most of these

>    systems use extra ssd's to emulate copy-on-write journaling. So from this i gather that its somehow possible to assign this

>    journaling role to some dedicated machines. how?

No. You can assign this journaling to dedicated disks, not dedicate machines. So if you have 12 OSD's in a machine, you could use 3 additional SSDs to hold the journals for 4 OSD's each.
What is the journal/storage ratio for this?

> 6) picture perfect and I would use btrfs. But i hear some complains of not being stable enough. How is that in the current version

>    of ubuntu 13.04 for example. I'm on a very short time schedule and have to come up with a solution. for example I could as well

>    install the first few nodes with xfs and then later on add some other machines with btrfs. but that is not my prefer scenario.

>    if a node with btrfs corrupts, does that mean all the other nodes are likely to corrupt as well?

I would not recommend using btrfs for now. You could migrate any time later (bring an OSD down, reformat with btrfs, bring it up again, the data will be moved 'back' automatically). I use XFS in a production cluster and I am happy with it.
However, you need additional ssd’s for journaling as you mentioned before in order to use all other disks for storage.

> 7) i followed some example setup from your website and when installing ceph i see a lot of packages being installed ending with -dev.

>    this probably only when you like to build from git yourself. however is that really necessary or can i just grab the latest binaries somewhere?

I don't know how you installed ceph and on which distribution. -dev are usually the header-files so you can compile for example qemu which depends on rbd.
Ubuntu 13.04. I picked the newest one in the hope btrfs was working. I read something on the ceph.com website claiming many bug have
 been fixed in the last release. However, since I subscribed to the ceph-user mailing list I understand that there are still a lot of problems with dumpling, should I focus on cuttlefish first? But then again, if I want to upgrade, I probably need the header-files
 as well?!

> many thanks in advance for anyone helping me out with these questions.

hope this helped;
It did actually
J

> regards,

> johannes

wolfgang

> 

>

>

> __________ Informatie van ESET Endpoint Antivirus, versie van database viruskenmerken 8705 (20130819) __________

>

> Het bericht is gecontroleerd door ESET Endpoint Antivirus.

>

> http://www.eset.com

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

__________ Informatie van ESET Endpoint Antivirus, versie van database viruskenmerken 8706 (20130820) __________

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com

__________ Informatie van ESET Endpoint Antivirus, versie van database viruskenmerken 8707 (20130820) __________

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com

__________ Informatie van ESET Endpoint Antivirus, versie van database viruskenmerken 8707 (20130820) __________

Het bericht is gecontroleerd door ESET Endpoint Antivirus.

http://www.eset.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com