Re: PROBLEM: CephFS write performance drops by 90%

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 16/12/2022 19:37, Roose, Marco wrote:
Hi Ilya and Xiubo,
thanks for looking onto this. I try to answer you questions:

==============================
What is the workload / test case?
I'm using a ~ 2G large test file which I rsync from local storage to the
ceph mount. I'm using rsync for convinience as the --progress coammand line
switch gives a good immidiate indicator if teh problem exists.


good Kernel (e.g. 5.6.0 RC7)

root@S1020-CephTest:~# uname -a
Linux S1020-CephTest 5.6.0-050600rc7-generic #202003230230 SMP Mon Mar 23
02:33:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@S1020-CephTest:~# ls tatort.mp4
tatort.mp4
root@S1020-CephTest:~# ls -la tatort.mp4
-rw-rw-r-- 1 nanoadmin nanoadmin 2106772019 Dec  7 11:25 tatort.mp4
root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 /mnt/ceph/tatort.mp4
sending incremental file list
tatort.mp4
           2.11G 100%  138.10MB/s    0:00:14 (xfr#1, to-chk=0/1)

sent 2.11G bytes  received 35 bytes  135.95M bytes/sec
total size is 2.11G  speedup is 1.00

bad Kernel (e.g. 5.6.0 FINAL)

root@S1020-CephTest:~# uname -a
Linux S1020-CephTest 5.6.0-050600-generic #202003292333 SMP Sun Mar 29
23:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 /mnt/ceph/tatort.mp4
sending incremental file list
tatort.mp4
          21.59M   1%    2.49MB/s    0:13:38

(see attached screen shot from netdata abour the difference in iowait for
both test cases)


As Xiubo supposed I tested with the very last RC kernel, too. Same problem:

Latest 6.1. RC kernel
root@S1020-CephTest:~# uname -a
Linux S1020-CephTest 6.1.0-060100rc5-generic #202211132230 SMP
PREEMPT_DYNAMIC Sun Nov 13 22:36:10 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 /mnt/ceph/tatort.mp4
sending incremental file list
tatort.mp4
          60.13M   2%    3.22MB/s    0:10:20

(attached a netdata screenshot for that, too).

==================================================

Can you describe how you performed bisection?  Can you share the
reproducer you used for bisection?
Took a commit from 5.4 which I had confirmed to be bad in earlier tests.
Than took tag 5.4.25 which I confirmed to be good as first "good"

# git bisect log
git bisect start
# bad: [61bbc823a17abb3798568cfb11ff38fc22317442] clk: ti: am43xx: Fix clock
parent for RTC clock
git bisect bad 61bbc823a17abb3798568cfb11ff38fc22317442
# good: [18fe53f6dfbc5ad4ff2164bff841b56d61b22720] Linux 5.4.25
git bisect good 18fe53f6dfbc5ad4ff2164bff841b56d61b22720
# good: [59e4624e664c9e83c04abae9b710cd60cb908a82] ALSA: seq: oss: Fix
running status after receiving sysex
git bisect good 59e4624e664c9e83c04abae9b710cd60cb908a82
# good: [8dab286ab527dc3fa68e9705b0805f4d6ce10add] fsl/fman: detect FMan
erratum A050385
git bisect good 8dab286ab527dc3fa68e9705b0805f4d6ce10add
# bad: [160c2ffa701692e60c7034271b4c06b843b7249f] xfrm: add the missing
verify_sec_ctx_len check in xfrm_add_acquire
git bisect bad 160c2ffa701692e60c7034271b4c06b843b7249f
# bad: [174da11b6474200e2e43509ce2d34e62ecea9f4b] ARM: dts: omap5: Add
bus_dma_limit for L3 bus
git bisect bad 174da11b6474200e2e43509ce2d34e62ecea9f4b
# good: [65047f7538ba5c0edcf4b4768d942970bb6d4cbc] iwlwifi: mvm: fix
non-ACPI function
git bisect good 65047f7538ba5c0edcf4b4768d942970bb6d4cbc
# good: [10d5de234df4a4567a8da18de04111f7e931fd70] RDMA/core: Fix missing
error check on dev_set_name()
git bisect good 10d5de234df4a4567a8da18de04111f7e931fd70
# good: [44960e1c39d807cd0023dc7036ee37f105617ebe] RDMA/mad: Do not crash if
the rdma device does not have a umad interface
git bisect good 44960e1c39d807cd0023dc7036ee37f105617ebe
# bad: [7cdaa5cd79abe15935393b4504eaf008361aa517] ceph: fix memory leak in
ceph_cleanup_snapid_map()
git bisect bad 7cdaa5cd79abe15935393b4504eaf008361aa517
# bad: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: check
POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
git bisect bad ed24820d1b0cbe8154c04189a44e363230ed647e
# first bad commit: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: check
POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL

Since you are here, could you try to revert this commit and have a try ?

Let's see whether is this commit causing it. I will take a look later this week.

Thanks

- Xiubo


Can you confirm the result by manually checking out the previous commit
and verifying that it's "good"?
root@S1020-CephTest:~/src/linux# git checkout -b ceph_check_last_good
44960e1c39d807cd0023dc7036ee37f105617ebe
Checking out files: 100% (68968/68968), done.
Switched to a new branch 'ceph_check_last_good'

root@S1020-CephTest:~/src/linux# make clean
...
root@S1020-CephTest:~/src/linux# cp -a /boot/config-5.4.25-050425-generic
.config
root@S1020-CephTest:~/src/linux# make olddefconfig
...
root@S1020-CephTest:~/src/linux# make bindeb-pkg -j"$(nproc)"
...
dpkg-deb: building package 'linux-headers-5.4.28+' in
'../linux-headers-5.4.28+_5.4.28+-10_amd64.deb'.
dpkg-deb: building package 'linux-libc-dev' in
'../linux-libc-dev_5.4.28+-10_amd64.deb'.
dpkg-deb: building package 'linux-image-5.4.28+' in
'../linux-image-5.4.28+_5.4.28+-10_amd64.deb'.
dpkg-deb: building package 'linux-image-5.4.28+-dbg' in
'../linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb'.
  dpkg-genbuildinfo --build=binary
  dpkg-genchanges --build=binary >../linux-5.4.28+_5.4.28+-10_amd64.changes
dpkg-genchanges: info: binary-only upload (no source code included)
  dpkg-source --after-build linux
dpkg-buildpackage: info: binary-only upload (no source included)
root@S1020-CephTest:~/src/linux# cd ..
root@S1020-CephTest:~/src# ls
linux
  linux-5.4.28+_5.4.28+-10_amd64.changes
linux-image-5.4.28+_5.4.28+-10_amd64.deb
linux-libc-dev_5.4.28+-10_amd64.deb
linux-5.4.28+_5.4.28+-10_amd64.buildinfo
linux-headers-5.4.28+_5.4.28+-10_amd64.deb
linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb
root@S1020-CephTest:~/src# dpkg -i linux-image-5.4.28+*
...
root@S1020-CephTest:~/src# reboot
....
root@S1020-CephTest:~# uname -a
Linux S1020-CephTest 5.4.28+ #10 SMP Fri Dec 16 09:20:11 CET 2022 x86_64
x86_64 x86_64 GNU/Linux
root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 /mnt/ceph/tatort.mp4
sending incremental file list
tatort.mp4
           2.11G 100%  135.15MB/s    0:00:14 (xfr#1, to-chk=0/1)

sent 2.11G bytes  received 35 bytes  135.95M bytes/sec
total size is 2.11G  speedup is 1.00

As you can see the problem does not exist here.

Thanks again!
Marco
________________________________________
From: Xiubo Li <xiubli@xxxxxxxxxx>
Sent: Friday, December 16, 2022 3:20:46 AM
To: Ilya Dryomov; Roose, Marco
Cc: Ceph Development
Subject: Re: PROBLEM: CephFS write performance drops by 90%
Hi Roose,

I think this should be similar with
https://tracker.ceph.com/issues/57898, but it's from 5.15 instead.

Days ago just after Ilya rebased to the 6.1 without changing anything in
ceph code the xfstest tests were much faster.

I just checked the difference about the 5.4 and 5.4.45 and couldn't know
what has happened exactly. So please share your test case about this.

- Xiubo

On 15/12/2022 23:32, Ilya Dryomov wrote:
On Thu, Dec 15, 2022 at 3:22 PM Roose, Marco <marco.roose@xxxxxxxxxxxxx>
wrote:
Dear Ilya,
I'm using Ubuntu and a CephFS mount. I had a more than 90% write
performance decrease after changing the kernels main version ( <10MB/s vs.
100-140 MB/s). The problem seems to exist in Kernel major versions starting
at v5.4. Ubuntu mainline version v5.4.25 is fine, v5.4.45 (which is next
available) is "bad".
Hi Marco,

What is the workload?

After a git bisect with the "original" 5.4 kernels I get the following
result:
Can you describe how you performed bisection?  Can you share the
reproducer you used for bisection?

ed24820d1b0cbe8154c04189a44e363230ed647e is the first bad commit
commit ed24820d1b0cbe8154c04189a44e363230ed647e
Author: Ilya Dryomov <idryomov@xxxxxxxxx>
Date:   Mon Mar 9 12:03:14 2020 +0100

       ceph: check POOL_FLAG_FULL/NEARFULL in addition to
OSDMAP_FULL/NEARFULL
       commit 7614209736fbc4927584d4387faade4f31444fce upstream.

       CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to
consult
       per-pool flags as well.  Unfortunately the backwards compatibility
here
       is lacking:

       - the change that deprecated OSDMAP_FULL/NEARFULL went into mimic,
but
         was guarded by require_osd_release >= RELEASE_LUMINOUS
       - it was subsequently backported to luminous in v12.2.2, but that
makes
         no difference to clients that only check OSDMAP_FULL/NEARFULL
because
         require_osd_release is not client-facing -- it is for OSDs

       Since all kernels are affected, the best we can do here is just
start
       checking both map flags and pool flags and send that to stable.

       These checks are best effort, so take osdc->lock and look up pool
flags
       just once.  Remove the FIXME, since filesystem quotas are checked
above
       and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool
reaches
       its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.
The only suspicious thing I see in this commit is osdc->lock semaphore
which is taken for read for a short period of time in ceph_write_iter().
It's possible that that started interfering with other code paths that
take that semaphore for write and read-write lock fairness algorithm is
biting...

Can you confirm the result by manually checking out the previous commit
and verifying that it's "good"?

       commit 44960e1c39d807cd0023dc7036ee37f105617ebe
       RDMA/mad: Do not crash if the rdma device does not have a umad
interface
           (commit 5bdfa854013ce4193de0d097931fd841382c76a7 upstream)

Thanks,

                   Ilya





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux