On 02/02/2024 16:41, Ruben Vestergaard wrote:
Hi group,
Today I conducted a small experiment to test an assumption of mine,
namely that Ceph incurs a substantial network overhead when doing many
small files.
One RBD was created, and on top of that an XFS containing 1.6 M files,
each with size 10 kiB:
# rbd info libvirt/bobtest
rbd image 'bobtest':
size 20 GiB in 5120 objects
order 22 (4 MiB objects)
[...]
# df -h /space
Filesystem Size Used Avail Use% Mounted on
/dev/rbd0 20G 20G 181M 100% /space
# ls -lh /space |head
total 19G
-rw-r--r--. 1 root root 10K Feb 2 14:13 xaa
-rw-r--r--. 1 root root 10K Feb 2 14:13 xab
-rw-r--r--. 1 root root 10K Feb 2 14:13 xac
-rw-r--r--. 1 root root 10K Feb 2 14:13 xad
-rw-r--r--. 1 root root 10K Feb 2 14:13 xae
-rw-r--r--. 1 root root 10K Feb 2 14:13 xaf
-rw-r--r--. 1 root root 10K Feb 2 14:13 xag
-rw-r--r--. 1 root root 10K Feb 2 14:13 xah
-rw-r--r--. 1 root root 10K Feb 2 14:13 xai
# ls /space |wc -l
1638400
All files contain pseudorandom (i.e. incompressible) junk.
My assumption was, that as the backend RBD block size is 4 MiB, it
would be necessary for the client machine to download at least that 4
MiB worth of data on any given request, even if the file in the XFS is
only 10 kB.
I.e. I cat(1) a small file, the RBD client grabs the relevant 4 MiB
block from Ceph, from this the small amount of requested data is
extracted and presented to userspace.
That's not what I see, however. My testing procedure is as follows:
I have a list of all the files on the RBD, order randomized, stored in
root's home folder -- this to make sure that I can pick file names at
random by going through the list from the top, and not causing network
traffic by listing files directly in the target FS. I then reboot the
node to ensure that all caches are empty and start an iftop(1) to
monitor network usage.
Mapping the RBD and mounting the XFS results in 5.29 MB worth of data
read from the network.
Reading one file at random from the XFS results in approx. 200 kB of
network read.
Reading 100 files at random results in approx. 3.83 MB of network read.
Reading 1000 files at random results in approx. 36.2 MB of network read.
Bottom line is that reading any 10 kiB of actual data results in
approximately 37 kiB data being transferred over the network.
Overhead, sure, but nowhere near what I expected, which was 4 MiB per
block of data "hit" in the backend.
Is the RBD client performing partial object reads? Is that even a thing?
Cheers,
Ruben Vestergaard
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
The OSD/rados api llows you read partial data within an object, you
specify the length and logical offset from with an object, no need to
read entire object if you do not need. This is not specific to rbd. The
small network overhead is i guess overhead in network protocol layers
including Ceph messenger overhead.
/maged
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx