On 4/4/23 07:59, David Cunningham wrote:
Hello, We are considering CephFS as an alternative to GlusterFS, and have some questions about performance. Is anyone able to advise us please? This would be for file systems between 100GB and 2TB in size, average file size around 5MB, and a mixture of reads and writes. I may not be using the correct terminology in the Ceph world, but in my parlance a node is a Linux server running the Ceph storage software. Multiple nodes make up the whole Ceph storage solution. Someone correct me if I should be using different terms! In our normal scenario the nodes in the replicated filesystem would be around 0.3ms apart, but we're also interested in geographically remote nodes which would be say 20ms away. We are using third party software which relies on a traditional Linux filesystem, so we can't use an object storage solution directly. So my specific questions are: 1. When reading a file from CephFS, does it read from just one node, or from all nodes?
Different objects could be in different nodes, if your read size is large enough the crush rule probably will distribute the read into different nodes.
2. If reads are from one node then does it choose the node with the fastest response to optimise performance, or if from all nodes then will reads be no faster than latency to the furthest node?
It choose the primary OSD to issue the reads, but sometimes it will choose a replica OSD instead, such as when considering the balance or if the primary OSD is not up, etc.
3. When writing to CephFS, are all nodes written to synchronously, or are writes to one node which then replicates that to other nodes asynchronously?
My understanding is that the Rados will reply to client only when all the replica OSDs have been successfully wrote to cache or disk.
4. Can anyone give a recommendation on maximum latency between nodes to have decent performance? 5. How does CephFS handle a node which suddenly becomes unavailable on the network? Is the block time configurable, and how good is the healing process after the lost node rejoins the network?
When one node is down or up the MON will update the osdmap to clients and the client will depend on this to issue the requests. As I remembered there should have some options doing this, but need to confirm this later.
Thanks, - Xiubo
6. I have read that CephFS is more complicated to administer than GlusterFS. What does everyone think? Are things like healing after a net split difficult for administrators new to Ceph to handle? Thanks very much in advance.
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx