XFS on top of RBD, overhead

Ruben Vestergaard <rubenv@xxxxxxxx> · Fri, 2 Feb 2024 15:41:58 +0100

Hi group,

Today I conducted a small experiment to test an assumption of mine, 
namely that Ceph incurs a substantial network overhead when doing many 
small files.

One RBD was created, and on top of that an XFS containing 1.6 M files, 
each with size 10 kiB:

    # rbd info libvirt/bobtest
    rbd image 'bobtest':
        size 20 GiB in 5120 objects
        order 22 (4 MiB objects)
        [...]

    # df -h /space
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/rbd0        20G   20G  181M 100% /space

    # ls -lh /space |head
    total 19G
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xaa
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xab
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xac
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xad
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xae
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xaf
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xag
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xah
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xai

    # ls /space |wc -l
    1638400

All files contain pseudorandom (i.e. incompressible) junk. 

My assumption was, that as the backend RBD block size is 4 MiB, it would 
be necessary for the client machine to download at least that 4 MiB 
worth of data on any given request, even if the file in the XFS is only 
10 kB.

I.e. I cat(1) a small file, the RBD client grabs the relevant 4 MiB 
block from Ceph, from this the small amount of requested data is 
extracted and presented to userspace.

That's not what I see, however. My testing procedure is as follows:

I have a list of all the files on the RBD, order randomized, stored in 
root's home folder -- this to make sure that I can pick file names at 
random by going through the list from the top, and not causing network 
traffic by listing files directly in the target FS. I then reboot the 
node to ensure that all caches are empty and start an iftop(1) to 
monitor network usage.

Mapping the RBD and mounting the XFS results in 5.29 MB worth of data 
read from the network.

Reading one file at random from the XFS results in approx. 200 kB of 
network read.

Reading 100 files at random results in approx. 3.83 MB of network read.

Reading 1000 files at random results in approx. 36.2 MB of network read.

Bottom line is that reading any 10 kiB of actual data results in 
approximately 37 kiB data being transferred over the network. Overhead, 
sure, but nowhere near what I expected, which was 4 MiB per block of 
data "hit" in the backend.

Is the RBD client performing partial object reads? Is that even a thing?

Cheers,
Ruben Vestergaard
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx