Re: CephFS - read latency.

Paul Emmerich <paul.emmerich@xxxxxxxx> · Sun, 17 Feb 2019 23:53:22 +0100

Probably not related to CephFS. Try to compare the latency you are
seeing to the op_r_latency reported by the OSDs.

The fast_read option on the pool can also help a lot for this IO pattern.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Feb 15, 2019 at 10:05 PM <jesper@xxxxxxxx> wrote:
>
> Hi.
>
> I've got a bunch of "small" files moved onto CephFS as archive/bulk storage
> and now I have the backup (to tape) to spool over them. A sample of the
> single-threaded backup client delivers this very consistent pattern:
>
> $ sudo strace -T -p 7307 2>&1 | grep -A 7 -B 3 open
> write(111, "\377\377\377\377", 4)       = 4 <0.000011>
> openat(AT_FDCWD, "/ceph/cluster/rsyncbackups/fileshare.txt", O_RDONLY) =
> 38 <0.000030>
> write(111, "\0\0\0\021197418 2 67201568", 21) = 21 <0.000036>
> read(38,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 65536) = 65536 <0.049733>
> write(111,
> "\0\1\0\0CLC\0\0\0\0\2\0\0\0\0\0\0\0\33\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 65540) = 65540 <0.000037>
> read(38, " $$$$$$$$$$ $$$$$$\16\33\16 $$$$$$$$\16\33"..., 65536) = 65536
> <0.000199>
> write(111, "\0\1\0\0 $$$$$$$$$$ $$$$$$\16\33\16 $$$$$$"..., 65540) = 65540
> <0.000026>
> read(38, "$ \33  \16\33\25 \33\33\33   \33\33\33
> \25\0\26\2\16NVDOLOVB"..., 65536) = 65536 <0.000035>
> write(111, "\0\1\0\0$ \33  \16\33\25 \33\33\33   \33\33\33
> \25\0\26\2\16NVDO"..., 65540) = 65540 <0.000024>
>
> The pattern is very consistent, thus it is not one PG or one OSD being
> contented.
> $ sudo strace -T -p 7307 2>&1 | grep -A 3 open  |grep read
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 11968 <0.070917>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 23232 <0.039789>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0P\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 65536 <0.053598>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 28240 <0.105046>
> read(41, "NZCA_FS_CLCGENOMICS, 1, 1\nNZCA_F"..., 65536) = 73 <0.061966>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 65536 <0.050943>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\30\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 65536) = 65536 <0.031217>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 7392 <0.052612>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 288 <0.075930>
> read(41, "1316919290-DASPHYNBAAAAAAPe2218b"..., 65536) = 940 <0.040609>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 22400 <0.038423>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 11984 <0.039051>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 9040 <0.054161>
> read(41, "NZCA_FS_CLCGENOMICS, 1, 1\nNZCA_F"..., 65536) = 73 <0.040654>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 22352 <0.031236>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0N\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 65536 <0.123424>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 49984 <0.052249>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 28176 <0.052742>
> read(41,
> "CLC\0\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536)
> = 288 <0.092039>
>
> Or to sum:
> sudo strace -T -p 23748 2>&1 | grep -A 3 open  | grep read |  perl
> -ane'/<(\d+\.\d+)>/; print $1 . "\n";' | head -n 1000 | ministat
>
>     N           Min           Max        Median           Avg        Stddev
> x 1000       3.2e-05      2.141551      0.054313   0.065834359   0.091480339
>
>
> As can be seen the "initial" read averages at 65.8ms - which - if the
> filesize is say 1MB and
> the rest of the time is 0 - caps read performance mostly 20MB/s .. at that
> pace, the journey
> through double digit TB is long even with 72 OSD's backing.
>
> Spec: Ceph Luminous 12.2.5 - Bluestore
> 6 OSD nodes, 10TB HDDs, 4+2 EC pool, 10GbitE
>
> Locally the drives deliver latencies of approximately 6-8ms for a random
> read. Any suggestion
> on where to find out where the remaining 50ms is being spend would be
> truely helpful.
>
> Large files "just works" as read-ahead does a nice job in getting
> performance up.
>
> --
> Jesper
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com