Re: Ceph newbee questions

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Fri, 22 Dec 2023 15:09:01 -0500

> I have manually configured a ceph cluster with ceph fs on debian bookworm.

Bookworm support is very, very recent I think.

> What is the difference from installing with cephadm compared to manuall install,
> any benefits that you miss with manual install?

A manual install is dramatically more work and much easier to get wrong.  There's also Rook if you skate k8s.

> There are also another couple of things that I can not figure out
> reading the documentation.
> 
> Most of our files are small and from my understanding replication is then recomended, right?

How small is "small"? https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit#gid=358760253

If your files are super small, like say <256KB  you may consume measurably more underlying storage space than you expect.

CephFS isn't my strong suit, but my understanding is that it's designed for reasonably large files.  As with RGW, if you store zillions of 1KB files you may not have the ideal experience.

> 
> The plan is to set ceph up like this:
> 1 x "admin node"

MDS AIUI is single-threaded and so will benefit from a high-frequency CPU more than a high-core-count CPU.

> 2 x "storage nodes"

You can do that for a PoC, but that's a bad idea for any production workload.  You'd want at least three nodes with OSDs to use the default RF=3 replication.  You can do RF=2, but at the peril of your mortal data.

> This works well to setup but I can not get my head around is
> how things are replicated over nodes and disks.
> In ceth.conf I set the folowing:
> osd pool default size = 2
> osd pool default min size = 1
> So the idea is that we always have 2 copies of the data.

Those are only defaults if you don't specify them when creating a pool.  I suggest always specifying the replication parameters explicitly when creating a pool.
min_size = 1 is a trap for any data you care about.

> I do not seem to be able to figure out the replication
> when things starts to fail.
> If the admin node goes down, one of the data nodes will
> run the mon, mgr and mds. This will slow things down but
> will be fine until we have a new admin node in place again.
> (or if there is something I am missing here?)

If you have 3 mons, that's mostly true.  The MDS situation is more nuanced.

> If just one data node goes down we will still not loose any
> data and that is fine until we have a new server.

... unless one of the drives in the surviving node fails.  

> But what if one data node goes down and one disk of the other
> data node breaks, will I loose data then?

It most likely will be at least unavailable until you get the first node back up with all OSDs.  This is one reason why RF=2 is okay for a sandbox but a bad idea for any data you care about.  There are legit situations where one doesn't care so much about losing data, but they are infrequent.

> Or how many disks can I loose before I loose data?
> This is what I can not get my head around, how to think
> when disaster strikes, how much hardware can I loose before
> I loose data?
> Or have I got it all wrong?
> Is it a bad idea with just 2 fileservers is more servers required?

Ceph is a scale-out solution, not meant for a very small number of servers.  For replication, you really want at least 3 nodes with OSDs and size=3,min_size=2.  More nodes is better.  If you need a smaller-scale solution, DRBD or ZFS might be better choices.

> 
> The second thing I have a problem with is snapshots.
> I manage to create snapshot in root with command:
> ceph fs subvolume snapshot create <vol_name> / <snap_name>
> But it fails if I try to create a shapshot in any
> other directory then in the root.
> Second of all if I try to create a snapshot from the
> client with:
> mkdir /mnt-ceph/.snap/my_snapshot
> I get the same error in all directories:
> Permission dened.
> I have not found any sollution to this,
> am I missing something here as well?
> Any config missing?
> 
> Many thanks for your support!!
> 
> Best regrads
> Marcus
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx