Re: ceph-fuse slow cache?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 24 Aug 2018 13:18:42 -0700

On Fri, Aug 24, 2018 at 1:20 AM Stefan Kooman <stefan@xxxxxx> wrote:
Hi Gregory,

Quoting Gregory Farnum (gfarnum@xxxxxxxxxx):

> This is quite strange. Given that you have a log, I think what you want to

> do is find one request in the log, trace it through its lifetime, and see

> where the time is elapsed. You may find a bifurcation, where some

> categories of requests happen instantly but other categories take a second

> or more; focus on the second, obviously.

So that is what I did. Turns out it's not the (slow) cache at all, probably

not to your surprise. The reads are quit fast actually, compared to

kernel client it's ~ 8 ms slower, or ~ 40%. It looks like couple

of writes / updates to, at least a session file, are slow:

2018-08-23 16:40:25.631 7f79156a8700 10 client.15158830 put_inode on

0x10000693859.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={} open={3=1}

mode=100600 size=0/4194304 nlink=1 btime=2018-08-23 16:40:25.632601

mtime=2018-08-23 16:40:25.632601 ctime=2018-08-23 16:40:25.632601

caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000693859 ts 0/0

objects 0 dirty_or_tx 0]

parents=0x1000068547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]

0x5646ff0e8000)

2018-08-23 16:40:28.547 7f79156a8700 10 client.15158830

update_inode_file_time 0x10000693859.head(faked_ino=0 ref=4 ll_ref=1

cap_refs={} open={3=1} mode=100600 size=0/4194304 nlink=1

btime=2018-08-23 16:40:25.632601 mtime=2018-08-23 16:40:25.632601

ctime=2018-08-23 16:40:25.632601

caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[0x10000693859 ts 0/0

objects 0 dirty_or_tx 0]

parents=0x1000068547c.head["sess_ia0agoj01ul4rob7ji55ouca41"]

0x5646ff0e8000) pAsxLsXsxFsxcrwb ctime 2018-08-23 16:40:25.632601 mtime

2018-08-23 16:40:25.632601

Hmm, these aren't actually the start and end times to the same operation. put_inode() is literally adjusting a refcount, which can happen for reasons ranging from the VFS doing something that drops it to an internal operation completing to a response coming back from the MDS. You should be able to find requests coming in from the kernel and a response going back out (the function names will be prefixed with "ll_", eg "ll_lookup").

So, almost 3 seconds. Page is only served after this, and possibly, after

some cache files have been written. Note though that ceph-fuse is in

debug=20 mode. Apparently the kernel client is _much_ faster in writing

than ceph-fuse. If I write a file with "dd" (from /dev/urandom) it's in

the tens of milliseconds range, not seconds. atime / ctime changes are

handled in < 5 ms.

I wonder if tuning file-striping [1] with stripe units of 4KB would be

beneficial in this case.

Gr. Stefan

[1]: http://docs.ceph.com/docs/master/dev/file-striping/

-- 

| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351

| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com