Re: Read and write performance on distributed filesystem

Xiubo Li <xiubli@xxxxxxxxxx> · Tue, 4 Apr 2023 20:29:46 +0800

On 4/4/23 07:59, David Cunningham wrote:
Hello,

We are considering CephFS as an alternative to GlusterFS, and have some
questions about performance. Is anyone able to advise us please?

This would be for file systems between 100GB and 2TB in size, average file
size around 5MB, and a mixture of reads and writes. I may not be using the
correct terminology in the Ceph world, but in my parlance a node is a Linux
server running the Ceph storage software. Multiple nodes make up the whole
Ceph storage solution. Someone correct me if I should be using different
terms!

In our normal scenario the nodes in the replicated filesystem would be
around 0.3ms apart, but we're also interested in geographically remote
nodes which would be say 20ms away. We are using third party software which
relies on a traditional Linux filesystem, so we can't use an object storage
solution directly.

So my specific questions are:

1. When reading a file from CephFS, does it read from just one node, or
from all nodes?

Different objects could be in different nodes, if your read size is 
large enough the crush rule probably will distribute the read into 
different nodes.

2. If reads are from one node then does it choose the node with the fastest
response to optimise performance, or if from all nodes then will reads be
no faster than latency to the furthest node?

It  choose the primary OSD to issue the reads, but sometimes it will 
choose a replica OSD instead, such as when considering the balance or if 
the primary OSD is not up, etc.

3. When writing to CephFS, are all nodes written to synchronously, or are
writes to one node which then replicates that to other nodes asynchronously?

My understanding is that the Rados will reply to client only when all 
the replica OSDs have been successfully wrote to cache or disk.

4. Can anyone give a recommendation on maximum latency between nodes to
have decent performance?

5. How does CephFS handle a node which suddenly becomes unavailable on the
network? Is the block time configurable, and how good is the healing
process after the lost node rejoins the network?

When one node is down or up the MON will update the osdmap to clients 
and the client will depend on this to issue the requests. As I 
remembered there should have some options doing this, but need to 
confirm this later.

Thanks,

- Xiubo

6. I have read that CephFS is more complicated to administer than
GlusterFS. What does everyone think? Are things like healing after a net
split difficult for administrators new to Ceph to handle?

Thanks very much in advance.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx