Hi. I've got a bunch of "small" files moved onto CephFS as archive/bulk storage and now I have the backup (to tape) to spool over them. A sample of the single-threaded backup client delivers this very consistent pattern: $ sudo strace -T -p 7307 2>&1 | grep -A 7 -B 3 open write(111, "\377\377\377\377", 4) = 4 <0.000011> openat(AT_FDCWD, "/ceph/cluster/rsyncbackups/fileshare.txt", O_RDONLY) = 38 <0.000030> write(111, "\0\0\0\021197418 2 67201568", 21) = 21 <0.000036> read(38, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 <0.049733> write(111, "\0\1\0\0CLC\0\0\0\0\2\0\0\0\0\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0"..., 65540) = 65540 <0.000037> read(38, " $$$$$$$$$$ $$$$$$\16\33\16 $$$$$$$$\16\33"..., 65536) = 65536 <0.000199> write(111, "\0\1\0\0 $$$$$$$$$$ $$$$$$\16\33\16 $$$$$$"..., 65540) = 65540 <0.000026> read(38, "$ \33 \16\33\25 \33\33\33 \33\33\33 \25\0\26\2\16NVDOLOVB"..., 65536) = 65536 <0.000035> write(111, "\0\1\0\0$ \33 \16\33\25 \33\33\33 \33\33\33 \25\0\26\2\16NVDO"..., 65540) = 65540 <0.000024> The pattern is very consistent, thus it is not one PG or one OSD being contented. $ sudo strace -T -p 7307 2>&1 | grep -A 3 open |grep read read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 11968 <0.070917> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 23232 <0.039789> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 <0.053598> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 28240 <0.105046> read(41, "NZCA_FS_CLCGENOMICS, 1, 1\nNZCA_F"..., 65536) = 73 <0.061966> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 <0.050943> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\30\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 <0.031217> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 7392 <0.052612> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 288 <0.075930> read(41, "1316919290-DASPHYNBAAAAAAPe2218b"..., 65536) = 940 <0.040609> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 22400 <0.038423> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 11984 <0.039051> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 9040 <0.054161> read(41, "NZCA_FS_CLCGENOMICS, 1, 1\nNZCA_F"..., 65536) = 73 <0.040654> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 22352 <0.031236> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0N\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536 <0.123424> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 49984 <0.052249> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 28176 <0.052742> read(41, "CLC\0\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 288 <0.092039> Or to sum: sudo strace -T -p 23748 2>&1 | grep -A 3 open | grep read | perl -ane'/<(\d+\.\d+)>/; print $1 . "\n";' | head -n 1000 | ministat N Min Max Median Avg Stddev x 1000 3.2e-05 2.141551 0.054313 0.065834359 0.091480339 As can be seen the "initial" read averages at 65.8ms - which - if the filesize is say 1MB and the rest of the time is 0 - caps read performance mostly 20MB/s .. at that pace, the journey through double digit TB is long even with 72 OSD's backing. Spec: Ceph Luminous 12.2.5 - Bluestore 6 OSD nodes, 10TB HDDs, 4+2 EC pool, 10GbitE Locally the drives deliver latencies of approximately 6-8ms for a random read. Any suggestion on where to find out where the remaining 50ms is being spend would be truely helpful. Large files "just works" as read-ahead does a nice job in getting performance up. -- Jesper _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com