Re: cephfs kernel client blocks when removing large files

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 15 Oct 2018 11:40:49 -0700

On Tue, Oct 9, 2018 at 10:57 PM Dylan McCulloch <dmc@xxxxxxxxxxxxxx> wrote:

Hi
 Greg,

Nowhere in your test procedure
 do you mention syncing or flushing the files to disk. That is almost certainly the cause of the slowness

We
 have tested performing sync after file creation and the delay still occurs. (See Test3 results below)

To
 clarify, it appears the delay is observed only when ls is performed on the same directory in which the files were removed, provided the files have been recently cached.
e.g.
 rm -f /mnt/cephfs_mountpoint/file*; ls /mnt/cephfs_mountpoint

the client which wrote the data
 is required to flush it out before dropping enough file "capabilities" for the other client to do the rm.

Our
 tests are performed on the same host.

In
 Test1 the rm and ls are performed by the same client id. And for other tests in which an unmount & remount were performed, I would assume the unmount would cause that particular client id to terminate and drop any caps.

Do
 you still believe held caps are contributing to slowness in these test scenarios?

Hmm, perhaps not. Or at least not in that way.

These tests are interesting; I'm not quite sure what might be going on here, but I think I'll have to let one of our more dedicated kernel CephFS people look at it, sorry.
-Greg

We’ve
 added 3 additional test cases below.

Test 3) Sync write (delay observed
 when writing files and syncing)

Test 4) Bypass cache (no delay
 observed when files are not written to cache)

Test 5) Read test (delay observed
 when removing files that have been read recently in to cache)

Test3:
 Sync Write - File creation, with sync after write.

1)
 unmount & remount:

2)
 Add 5 x 100GB files to a directory:

for
 i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576;done

3)
 sync

4)
 Delete all files in directory:

for
 i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done

5)
 Immediately perform ls on directory:

time
 ls /mnt/cephfs_mountpoint
real
    0m8.765s
user
    0m0.001s
sys
     0m0.000s

Test4:
 Bypass cache - File creation, with nocache options for dd.

1)
 unmount & remount:

2)
 Add 5 x 100GB files to a directory:

for
 i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576 oflag=nocache,sync iflag=nocache;done

3)
 sync

4)
 Delete all files in directory:

for
 i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done

5)
 Immediately perform ls on directory:

time
 ls /mnt/cephfs_mountpoint
real
    0m0.003s
user
    0m0.000s
sys
     0m0.001s

Test5:
 Read test - Read files into empty page cache, before deletion.
1)
 unmount & remount

2)
 Add 5 x 100GB files to a directory:

for
 i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576;done

3)
 sync

4)
 unmount & remount #empty cache

5)
 read files (to add back to cache)
for
 i in {1..5};do cat /mnt/cephfs_mountpoint/file$i.txt > /dev/null; done

6)
 Delete all files in directory:

for
 i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done

5)
 Immediately perform ls on directory:

time
 ls /mnt/cephfs_mountpoint
real
    0m8.723s
user
    0m0.000s
sys
     0m0.001s

Regards,

Dylan

From: Gregory Farnum <gfarnum@xxxxxxxxxx>

Sent: Wednesday, October 10, 2018 4:37:49 AM

To: Dylan McCulloch

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  cephfs kernel client blocks when removing large files

Nowhere in your test procedure do you mention syncing or flushing the files to disk. That is almost certainly the cause of the slowness — the client which wrote the data is required to flush it out before dropping enough file "capabilities" for
 the other client to do the rm.
-Greg

On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch <dmc@xxxxxxxxxxxxxx> wrote:

Hi
 all,

We
 have identified some unexpected blocking behaviour by the ceph-fs kernel client.

When
 performing 'rm' on large files (100+GB), there appears to be a significant delay of 10 seconds or more, before a 'stat' operation can be performed on the same directory on the filesystem.

Looking
 at the kernel client's mds inflight-ops, we observe that there are pending 

UNLINK
 operations corresponding to the deleted files.

We
 have noted some correlation between files being in the client page cache and the blocking behaviour. For example, if the cache is dropped or the filesystem remounted the blocking will not occur.

Test
 scenario below:

/mnt/cephfs_mountpoint
 type ceph (rw,relatime,name=ceph_filesystem,secret=<hidden>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)

Test1:
1)
 unmount & remount:

2)
 Add 10 x 100GB files to a directory:

for
 i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done

3)
 Delete all files in directory:

for
 i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done

4)
 Immediately perform ls on directory:

time
 ls /mnt/cephfs_mountpoint/test1

Result:
 delay ~16 seconds
real
    0m16.818s
user
    0m0.000s
sys
     0m0.002s

Test2:

1)
 unmount & remount

2)
 Add 10 x 100GB files to a directory
for
 i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done

3)
 Either a) unmount & remount; or b) drop caches

echo
 3 >/proc/sys/vm/drop_caches

4)
 Delete files in directory:

for
 i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done

5)
 Immediately perform ls on directory:

time
 ls /mnt/cephfs_mountpoint/test1

Result:
 no delay
real
    0m0.010s
user
    0m0.000s
sys
     0m0.001s

Our
 understanding of ceph-fs’ file deletion mechanism, is that there should be no blocking observed on the client.
http://docs.ceph.com/docs/mimic/dev/delayed-delete/
 .
It
 appears that if files are cached on the client, either by being created or accessed recently  it will cause the kernel client to block for reasons we have not identified.

Is
 this a known issue, are there any ways to mitigate this behaviour?
Our
 production system relies on our client’s processes having concurrent access to the file system, and access contention must be avoided.

An
 old mailing list post that discusses changes to client’s page cache behaviour may be relevant.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html

Client
 System:

OS:
 RHEL7
Kernel:
 4.15.15-1

Cluster:
 Ceph: Luminous 12.2.8

Thanks,
Dylan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com