Re: XFS on top of RBD, overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 02/02/2024 16:41, Ruben Vestergaard wrote:
Hi group,

Today I conducted a small experiment to test an assumption of mine, namely that Ceph incurs a substantial network overhead when doing many small files.

One RBD was created, and on top of that an XFS containing 1.6 M files, each with size 10 kiB:

    # rbd info libvirt/bobtest
    rbd image 'bobtest':
        size 20 GiB in 5120 objects
        order 22 (4 MiB objects)
        [...]

    # df -h /space
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/rbd0        20G   20G  181M 100% /space

    # ls -lh /space |head
    total 19G
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xaa
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xab
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xac
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xad
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xae
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xaf
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xag
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xah
    -rw-r--r--. 1 root root 10K Feb  2 14:13 xai

    # ls /space |wc -l
    1638400

All files contain pseudorandom (i.e. incompressible) junk.
My assumption was, that as the backend RBD block size is 4 MiB, it would be necessary for the client machine to download at least that 4 MiB worth of data on any given request, even if the file in the XFS is only 10 kB.

I.e. I cat(1) a small file, the RBD client grabs the relevant 4 MiB block from Ceph, from this the small amount of requested data is extracted and presented to userspace.

That's not what I see, however. My testing procedure is as follows:

I have a list of all the files on the RBD, order randomized, stored in root's home folder -- this to make sure that I can pick file names at random by going through the list from the top, and not causing network traffic by listing files directly in the target FS. I then reboot the node to ensure that all caches are empty and start an iftop(1) to monitor network usage.

Mapping the RBD and mounting the XFS results in 5.29 MB worth of data read from the network.

Reading one file at random from the XFS results in approx. 200 kB of network read.

Reading 100 files at random results in approx. 3.83 MB of network read.

Reading 1000 files at random results in approx. 36.2 MB of network read.

Bottom line is that reading any 10 kiB of actual data results in approximately 37 kiB data being transferred over the network. Overhead, sure, but nowhere near what I expected, which was 4 MiB per block of data "hit" in the backend.

Is the RBD client performing partial object reads? Is that even a thing?

Cheers,
Ruben Vestergaard
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

The OSD/rados api llows you read partial data within an object, you specify the length and logical offset from with an object, no need to read entire object if you do not need. This is not specific to rbd. The small network overhead is i guess overhead in network protocol layers including Ceph messenger overhead.

/maged

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux