Dear Xiubo, my colleague Paul (in CC) tried to revert the commit, but it was'nt possible. Kind regards, Marco Roose -----Original Message----- From: Xiubo Li <xiubli@xxxxxxxxxx> Sent: 19 December 2022 01:16 To: Roose, Marco <marco.roose@xxxxxxxxxxxxx>; Ilya Dryomov <idryomov@xxxxxxxxx> Cc: Ceph Development <ceph-devel@xxxxxxxxxxxxxxx> Subject: Re: PROBLEM: CephFS write performance drops by 90% On 16/12/2022 19:37, Roose, Marco wrote: > Hi Ilya and Xiubo, > thanks for looking onto this. I try to answer you questions: > > ============================== >> What is the workload / test case? > I'm using a ~ 2G large test file which I rsync from local storage to > the ceph mount. I'm using rsync for convinience as the --progress > coammand line switch gives a good immidiate indicator if teh problem exists. > > > good Kernel (e.g. 5.6.0 RC7) > > root@S1020-CephTest:~# uname -a > Linux S1020-CephTest 5.6.0-050600rc7-generic #202003230230 SMP Mon Mar > 23 > 02:33:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > root@S1020-CephTest:~# ls tatort.mp4 > tatort.mp4 > root@S1020-CephTest:~# ls -la tatort.mp4 > -rw-rw-r-- 1 nanoadmin nanoadmin 2106772019 Dec 7 11:25 tatort.mp4 > root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 > /mnt/ceph/tatort.mp4 sending incremental file list > tatort.mp4 > 2.11G 100% 138.10MB/s 0:00:14 (xfr#1, to-chk=0/1) > > sent 2.11G bytes received 35 bytes 135.95M bytes/sec total size is > 2.11G speedup is 1.00 > > bad Kernel (e.g. 5.6.0 FINAL) > > root@S1020-CephTest:~# uname -a > Linux S1020-CephTest 5.6.0-050600-generic #202003292333 SMP Sun Mar 29 > 23:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 > /mnt/ceph/tatort.mp4 sending incremental file list > tatort.mp4 > 21.59M 1% 2.49MB/s 0:13:38 > > (see attached screen shot from netdata abour the difference in iowait > for both test cases) > > > As Xiubo supposed I tested with the very last RC kernel, too. Same problem: > > Latest 6.1. RC kernel > root@S1020-CephTest:~# uname -a > Linux S1020-CephTest 6.1.0-060100rc5-generic #202211132230 SMP > PREEMPT_DYNAMIC Sun Nov 13 22:36:10 UTC 2022 x86_64 x86_64 x86_64 > GNU/Linux root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 > /mnt/ceph/tatort.mp4 sending incremental file list > tatort.mp4 > 60.13M 2% 3.22MB/s 0:10:20 > > (attached a netdata screenshot for that, too). > > ================================================== > >> Can you describe how you performed bisection? Can you share the >> reproducer you used for bisection? > Took a commit from 5.4 which I had confirmed to be bad in earlier tests. > Than took tag 5.4.25 which I confirmed to be good as first "good" > > # git bisect log > git bisect start > # bad: [61bbc823a17abb3798568cfb11ff38fc22317442] clk: ti: am43xx: Fix > clock parent for RTC clock git bisect bad > 61bbc823a17abb3798568cfb11ff38fc22317442 > # good: [18fe53f6dfbc5ad4ff2164bff841b56d61b22720] Linux 5.4.25 git > bisect good 18fe53f6dfbc5ad4ff2164bff841b56d61b22720 > # good: [59e4624e664c9e83c04abae9b710cd60cb908a82] ALSA: seq: oss: Fix > running status after receiving sysex git bisect good > 59e4624e664c9e83c04abae9b710cd60cb908a82 > # good: [8dab286ab527dc3fa68e9705b0805f4d6ce10add] fsl/fman: detect > FMan erratum A050385 git bisect good > 8dab286ab527dc3fa68e9705b0805f4d6ce10add > # bad: [160c2ffa701692e60c7034271b4c06b843b7249f] xfrm: add the > missing verify_sec_ctx_len check in xfrm_add_acquire git bisect bad > 160c2ffa701692e60c7034271b4c06b843b7249f > # bad: [174da11b6474200e2e43509ce2d34e62ecea9f4b] ARM: dts: omap5: Add > bus_dma_limit for L3 bus git bisect bad > 174da11b6474200e2e43509ce2d34e62ecea9f4b > # good: [65047f7538ba5c0edcf4b4768d942970bb6d4cbc] iwlwifi: mvm: fix > non-ACPI function git bisect good > 65047f7538ba5c0edcf4b4768d942970bb6d4cbc > # good: [10d5de234df4a4567a8da18de04111f7e931fd70] RDMA/core: Fix > missing error check on dev_set_name() git bisect good > 10d5de234df4a4567a8da18de04111f7e931fd70 > # good: [44960e1c39d807cd0023dc7036ee37f105617ebe] RDMA/mad: Do not > crash if the rdma device does not have a umad interface git bisect > good 44960e1c39d807cd0023dc7036ee37f105617ebe > # bad: [7cdaa5cd79abe15935393b4504eaf008361aa517] ceph: fix memory > leak in > ceph_cleanup_snapid_map() > git bisect bad 7cdaa5cd79abe15935393b4504eaf008361aa517 > # bad: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: check > POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL git bisect > bad ed24820d1b0cbe8154c04189a44e363230ed647e > # first bad commit: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: > check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL Since you are here, could you try to revert this commit and have a try ? Let's see whether is this commit causing it. I will take a look later this week. Thanks - Xiubo > >> Can you confirm the result by manually checking out the previous >> commit and verifying that it's "good"? > root@S1020-CephTest:~/src/linux# git checkout -b ceph_check_last_good > 44960e1c39d807cd0023dc7036ee37f105617ebe > Checking out files: 100% (68968/68968), done. > Switched to a new branch 'ceph_check_last_good' > > root@S1020-CephTest:~/src/linux# make clean ... > root@S1020-CephTest:~/src/linux# cp -a > /boot/config-5.4.25-050425-generic > .config > root@S1020-CephTest:~/src/linux# make olddefconfig ... > root@S1020-CephTest:~/src/linux# make bindeb-pkg -j"$(nproc)" > ... > dpkg-deb: building package 'linux-headers-5.4.28+' in > '../linux-headers-5.4.28+_5.4.28+-10_amd64.deb'. > dpkg-deb: building package 'linux-libc-dev' in > '../linux-libc-dev_5.4.28+-10_amd64.deb'. > dpkg-deb: building package 'linux-image-5.4.28+' in > '../linux-image-5.4.28+_5.4.28+-10_amd64.deb'. > dpkg-deb: building package 'linux-image-5.4.28+-dbg' in > '../linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb'. > dpkg-genbuildinfo --build=binary > dpkg-genchanges --build=binary > >../linux-5.4.28+_5.4.28+-10_amd64.changes > dpkg-genchanges: info: binary-only upload (no source code included) > dpkg-source --after-build linux > dpkg-buildpackage: info: binary-only upload (no source included) > root@S1020-CephTest:~/src/linux# cd .. > root@S1020-CephTest:~/src# ls > linux > linux-5.4.28+_5.4.28+-10_amd64.changes > linux-image-5.4.28+_5.4.28+-10_amd64.deb > linux-libc-dev_5.4.28+-10_amd64.deb > linux-5.4.28+_5.4.28+-10_amd64.buildinfo > linux-headers-5.4.28+_5.4.28+-10_amd64.deb > linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb > root@S1020-CephTest:~/src# dpkg -i linux-image-5.4.28+* ... > root@S1020-CephTest:~/src# reboot > .... > root@S1020-CephTest:~# uname -a > Linux S1020-CephTest 5.4.28+ #10 SMP Fri Dec 16 09:20:11 CET 2022 > x86_64 > x86_64 x86_64 GNU/Linux > root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 > /mnt/ceph/tatort.mp4 sending incremental file list > tatort.mp4 > 2.11G 100% 135.15MB/s 0:00:14 (xfr#1, to-chk=0/1) > > sent 2.11G bytes received 35 bytes 135.95M bytes/sec total size is > 2.11G speedup is 1.00 > > As you can see the problem does not exist here. > > Thanks again! > Marco > ________________________________________ > From: Xiubo Li <xiubli@xxxxxxxxxx> > Sent: Friday, December 16, 2022 3:20:46 AM > To: Ilya Dryomov; Roose, Marco > Cc: Ceph Development > Subject: Re: PROBLEM: CephFS write performance drops by 90% > > Hi Roose, > > I think this should be similar with > https://tracker.ceph.com/issues/57898, but it's from 5.15 instead. > > Days ago just after Ilya rebased to the 6.1 without changing anything > in ceph code the xfstest tests were much faster. > > I just checked the difference about the 5.4 and 5.4.45 and couldn't > know what has happened exactly. So please share your test case about this. > > - Xiubo > > On 15/12/2022 23:32, Ilya Dryomov wrote: >> On Thu, Dec 15, 2022 at 3:22 PM Roose, Marco >> <marco.roose@xxxxxxxxxxxxx> > wrote: >>> Dear Ilya, >>> I'm using Ubuntu and a CephFS mount. I had a more than 90% write > performance decrease after changing the kernels main version ( <10MB/s vs. > 100-140 MB/s). The problem seems to exist in Kernel major versions > starting at v5.4. Ubuntu mainline version v5.4.25 is fine, v5.4.45 > (which is next > available) is "bad". >> Hi Marco, >> >> What is the workload? >> >>> After a git bisect with the "original" 5.4 kernels I get the >>> following > result: >> Can you describe how you performed bisection? Can you share the >> reproducer you used for bisection? >> >>> ed24820d1b0cbe8154c04189a44e363230ed647e is the first bad commit >>> commit ed24820d1b0cbe8154c04189a44e363230ed647e >>> Author: Ilya Dryomov <idryomov@xxxxxxxxx> >>> Date: Mon Mar 9 12:03:14 2020 +0100 >>> >>> ceph: check POOL_FLAG_FULL/NEARFULL in addition to > OSDMAP_FULL/NEARFULL >>> commit 7614209736fbc4927584d4387faade4f31444fce upstream. >>> >>> CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need >>> to > consult >>> per-pool flags as well. Unfortunately the backwards >>> compatibility > here >>> is lacking: >>> >>> - the change that deprecated OSDMAP_FULL/NEARFULL went into >>> mimic, > but >>> was guarded by require_osd_release >= RELEASE_LUMINOUS >>> - it was subsequently backported to luminous in v12.2.2, but >>> that > makes >>> no difference to clients that only check >>> OSDMAP_FULL/NEARFULL > because >>> require_osd_release is not client-facing -- it is for OSDs >>> >>> Since all kernels are affected, the best we can do here is >>> just > start >>> checking both map flags and pool flags and send that to stable. >>> >>> These checks are best effort, so take osdc->lock and look up >>> pool > flags >>> just once. Remove the FIXME, since filesystem quotas are >>> checked > above >>> and RADOS quotas are reflected in POOL_FLAG_FULL: when the >>> pool > reaches >>> its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set. >> The only suspicious thing I see in this commit is osdc->lock >> semaphore which is taken for read for a short period of time in ceph_write_iter(). >> It's possible that that started interfering with other code paths >> that take that semaphore for write and read-write lock fairness >> algorithm is biting... >> >> Can you confirm the result by manually checking out the previous >> commit and verifying that it's "good"? >> >> commit 44960e1c39d807cd0023dc7036ee37f105617ebe >> RDMA/mad: Do not crash if the rdma device does not have a umad > interface >> (commit 5bdfa854013ce4193de0d097931fd841382c76a7 upstream) >> >> Thanks, >> >> Ilya >>
Attachment:
smime.p7s
Description: S/MIME cryptographic signature