RE: PROBLEM: CephFS write performance drops by 90%

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Xiubo,
my colleague Paul (in CC) tried to revert the commit, but it was'nt
possible.

Kind regards,
Marco Roose

-----Original Message-----
From: Xiubo Li <xiubli@xxxxxxxxxx> 
Sent: 19 December 2022 01:16
To: Roose, Marco <marco.roose@xxxxxxxxxxxxx>; Ilya Dryomov
<idryomov@xxxxxxxxx>
Cc: Ceph Development <ceph-devel@xxxxxxxxxxxxxxx>
Subject: Re: PROBLEM: CephFS write performance drops by 90%


On 16/12/2022 19:37, Roose, Marco wrote:
> Hi Ilya and Xiubo,
> thanks for looking onto this. I try to answer you questions:
>
> ==============================
>> What is the workload / test case?
> I'm using a ~ 2G large test file which I rsync from local storage to 
> the ceph mount. I'm using rsync for convinience as the --progress 
> coammand line switch gives a good immidiate indicator if teh problem
exists.
>
>
> good Kernel (e.g. 5.6.0 RC7)
>
> root@S1020-CephTest:~# uname -a
> Linux S1020-CephTest 5.6.0-050600rc7-generic #202003230230 SMP Mon Mar 
> 23
> 02:33:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux 
> root@S1020-CephTest:~# ls tatort.mp4
> tatort.mp4
> root@S1020-CephTest:~# ls -la tatort.mp4
> -rw-rw-r-- 1 nanoadmin nanoadmin 2106772019 Dec  7 11:25 tatort.mp4 
> root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 
> /mnt/ceph/tatort.mp4 sending incremental file list
> tatort.mp4
>            2.11G 100%  138.10MB/s    0:00:14 (xfr#1, to-chk=0/1)
>
> sent 2.11G bytes  received 35 bytes  135.95M bytes/sec total size is 
> 2.11G  speedup is 1.00
>
> bad Kernel (e.g. 5.6.0 FINAL)
>
> root@S1020-CephTest:~# uname -a
> Linux S1020-CephTest 5.6.0-050600-generic #202003292333 SMP Sun Mar 29
> 23:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux 
> root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 
> /mnt/ceph/tatort.mp4 sending incremental file list
> tatort.mp4
>           21.59M   1%    2.49MB/s    0:13:38
>
> (see attached screen shot from netdata abour the difference in iowait 
> for both test cases)
>
>
> As Xiubo supposed I tested with the very last RC kernel, too. Same
problem:
>
> Latest 6.1. RC kernel
> root@S1020-CephTest:~# uname -a
> Linux S1020-CephTest 6.1.0-060100rc5-generic #202211132230 SMP 
> PREEMPT_DYNAMIC Sun Nov 13 22:36:10 UTC 2022 x86_64 x86_64 x86_64 
> GNU/Linux root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 
> /mnt/ceph/tatort.mp4 sending incremental file list
> tatort.mp4
>           60.13M   2%    3.22MB/s    0:10:20
>
> (attached a netdata screenshot for that, too).
>
> ==================================================
>
>> Can you describe how you performed bisection?  Can you share the 
>> reproducer you used for bisection?
> Took a commit from 5.4 which I had confirmed to be bad in earlier tests.
> Than took tag 5.4.25 which I confirmed to be good as first "good"
>
> # git bisect log
> git bisect start
> # bad: [61bbc823a17abb3798568cfb11ff38fc22317442] clk: ti: am43xx: Fix 
> clock parent for RTC clock git bisect bad 
> 61bbc823a17abb3798568cfb11ff38fc22317442
> # good: [18fe53f6dfbc5ad4ff2164bff841b56d61b22720] Linux 5.4.25 git 
> bisect good 18fe53f6dfbc5ad4ff2164bff841b56d61b22720
> # good: [59e4624e664c9e83c04abae9b710cd60cb908a82] ALSA: seq: oss: Fix 
> running status after receiving sysex git bisect good 
> 59e4624e664c9e83c04abae9b710cd60cb908a82
> # good: [8dab286ab527dc3fa68e9705b0805f4d6ce10add] fsl/fman: detect 
> FMan erratum A050385 git bisect good 
> 8dab286ab527dc3fa68e9705b0805f4d6ce10add
> # bad: [160c2ffa701692e60c7034271b4c06b843b7249f] xfrm: add the 
> missing verify_sec_ctx_len check in xfrm_add_acquire git bisect bad 
> 160c2ffa701692e60c7034271b4c06b843b7249f
> # bad: [174da11b6474200e2e43509ce2d34e62ecea9f4b] ARM: dts: omap5: Add 
> bus_dma_limit for L3 bus git bisect bad 
> 174da11b6474200e2e43509ce2d34e62ecea9f4b
> # good: [65047f7538ba5c0edcf4b4768d942970bb6d4cbc] iwlwifi: mvm: fix 
> non-ACPI function git bisect good 
> 65047f7538ba5c0edcf4b4768d942970bb6d4cbc
> # good: [10d5de234df4a4567a8da18de04111f7e931fd70] RDMA/core: Fix 
> missing error check on dev_set_name() git bisect good 
> 10d5de234df4a4567a8da18de04111f7e931fd70
> # good: [44960e1c39d807cd0023dc7036ee37f105617ebe] RDMA/mad: Do not 
> crash if the rdma device does not have a umad interface git bisect 
> good 44960e1c39d807cd0023dc7036ee37f105617ebe
> # bad: [7cdaa5cd79abe15935393b4504eaf008361aa517] ceph: fix memory 
> leak in
> ceph_cleanup_snapid_map()
> git bisect bad 7cdaa5cd79abe15935393b4504eaf008361aa517
> # bad: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: check 
> POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL git bisect 
> bad ed24820d1b0cbe8154c04189a44e363230ed647e
> # first bad commit: [ed24820d1b0cbe8154c04189a44e363230ed647e] ceph: 
> check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL

Since you are here, could you try to revert this commit and have a try ?

Let's see whether is this commit causing it. I will take a look later this
week.

Thanks

- Xiubo

>
>> Can you confirm the result by manually checking out the previous 
>> commit and verifying that it's "good"?
> root@S1020-CephTest:~/src/linux# git checkout -b ceph_check_last_good 
> 44960e1c39d807cd0023dc7036ee37f105617ebe
> Checking out files: 100% (68968/68968), done.
> Switched to a new branch 'ceph_check_last_good'
>
> root@S1020-CephTest:~/src/linux# make clean ...
> root@S1020-CephTest:~/src/linux# cp -a 
> /boot/config-5.4.25-050425-generic
> .config
> root@S1020-CephTest:~/src/linux# make olddefconfig ...
> root@S1020-CephTest:~/src/linux# make bindeb-pkg -j"$(nproc)"
> ...
> dpkg-deb: building package 'linux-headers-5.4.28+' in 
> '../linux-headers-5.4.28+_5.4.28+-10_amd64.deb'.
> dpkg-deb: building package 'linux-libc-dev' in 
> '../linux-libc-dev_5.4.28+-10_amd64.deb'.
> dpkg-deb: building package 'linux-image-5.4.28+' in 
> '../linux-image-5.4.28+_5.4.28+-10_amd64.deb'.
> dpkg-deb: building package 'linux-image-5.4.28+-dbg' in 
> '../linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb'.
>   dpkg-genbuildinfo --build=binary
>   dpkg-genchanges --build=binary 
> >../linux-5.4.28+_5.4.28+-10_amd64.changes
> dpkg-genchanges: info: binary-only upload (no source code included)
>   dpkg-source --after-build linux
> dpkg-buildpackage: info: binary-only upload (no source included) 
> root@S1020-CephTest:~/src/linux# cd ..
> root@S1020-CephTest:~/src# ls
> linux
>   linux-5.4.28+_5.4.28+-10_amd64.changes
> linux-image-5.4.28+_5.4.28+-10_amd64.deb
> linux-libc-dev_5.4.28+-10_amd64.deb
> linux-5.4.28+_5.4.28+-10_amd64.buildinfo
> linux-headers-5.4.28+_5.4.28+-10_amd64.deb
> linux-image-5.4.28+-dbg_5.4.28+-10_amd64.deb
> root@S1020-CephTest:~/src# dpkg -i linux-image-5.4.28+* ...
> root@S1020-CephTest:~/src# reboot
> ....
> root@S1020-CephTest:~# uname -a
> Linux S1020-CephTest 5.4.28+ #10 SMP Fri Dec 16 09:20:11 CET 2022 
> x86_64
> x86_64 x86_64 GNU/Linux
> root@S1020-CephTest:~# rsync -avh --progress tatort.mp4 
> /mnt/ceph/tatort.mp4 sending incremental file list
> tatort.mp4
>            2.11G 100%  135.15MB/s    0:00:14 (xfr#1, to-chk=0/1)
>
> sent 2.11G bytes  received 35 bytes  135.95M bytes/sec total size is 
> 2.11G  speedup is 1.00
>
> As you can see the problem does not exist here.
>
> Thanks again!
> Marco
> ________________________________________
> From: Xiubo Li <xiubli@xxxxxxxxxx>
> Sent: Friday, December 16, 2022 3:20:46 AM
> To: Ilya Dryomov; Roose, Marco
> Cc: Ceph Development
> Subject: Re: PROBLEM: CephFS write performance drops by 90%
>   
> Hi Roose,
>
> I think this should be similar with
> https://tracker.ceph.com/issues/57898, but it's from 5.15 instead.
>
> Days ago just after Ilya rebased to the 6.1 without changing anything 
> in ceph code the xfstest tests were much faster.
>
> I just checked the difference about the 5.4 and 5.4.45 and couldn't 
> know what has happened exactly. So please share your test case about this.
>
> - Xiubo
>
> On 15/12/2022 23:32, Ilya Dryomov wrote:
>> On Thu, Dec 15, 2022 at 3:22 PM Roose, Marco 
>> <marco.roose@xxxxxxxxxxxxx>
> wrote:
>>> Dear Ilya,
>>> I'm using Ubuntu and a CephFS mount. I had a more than 90% write
> performance decrease after changing the kernels main version ( <10MB/s vs.
> 100-140 MB/s). The problem seems to exist in Kernel major versions 
> starting at v5.4. Ubuntu mainline version v5.4.25 is fine, v5.4.45 
> (which is next
> available) is "bad".
>> Hi Marco,
>>
>> What is the workload?
>>
>>> After a git bisect with the "original" 5.4 kernels I get the 
>>> following
> result:
>> Can you describe how you performed bisection?  Can you share the 
>> reproducer you used for bisection?
>>
>>> ed24820d1b0cbe8154c04189a44e363230ed647e is the first bad commit 
>>> commit ed24820d1b0cbe8154c04189a44e363230ed647e
>>> Author: Ilya Dryomov <idryomov@xxxxxxxxx>
>>> Date:   Mon Mar 9 12:03:14 2020 +0100
>>>
>>>        ceph: check POOL_FLAG_FULL/NEARFULL in addition to
> OSDMAP_FULL/NEARFULL
>>>        commit 7614209736fbc4927584d4387faade4f31444fce upstream.
>>>
>>>        CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need 
>>> to
> consult
>>>        per-pool flags as well.  Unfortunately the backwards 
>>> compatibility
> here
>>>        is lacking:
>>>
>>>        - the change that deprecated OSDMAP_FULL/NEARFULL went into 
>>> mimic,
> but
>>>          was guarded by require_osd_release >= RELEASE_LUMINOUS
>>>        - it was subsequently backported to luminous in v12.2.2, but 
>>> that
> makes
>>>          no difference to clients that only check 
>>> OSDMAP_FULL/NEARFULL
> because
>>>          require_osd_release is not client-facing -- it is for OSDs
>>>
>>>        Since all kernels are affected, the best we can do here is 
>>> just
> start
>>>        checking both map flags and pool flags and send that to stable.
>>>
>>>        These checks are best effort, so take osdc->lock and look up 
>>> pool
> flags
>>>        just once.  Remove the FIXME, since filesystem quotas are 
>>> checked
> above
>>>        and RADOS quotas are reflected in POOL_FLAG_FULL: when the 
>>> pool
> reaches
>>>        its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.
>> The only suspicious thing I see in this commit is osdc->lock 
>> semaphore which is taken for read for a short period of time in
ceph_write_iter().
>> It's possible that that started interfering with other code paths 
>> that take that semaphore for write and read-write lock fairness 
>> algorithm is biting...
>>
>> Can you confirm the result by manually checking out the previous 
>> commit and verifying that it's "good"?
>>
>>        commit 44960e1c39d807cd0023dc7036ee37f105617ebe
>>        RDMA/mad: Do not crash if the rdma device does not have a umad
> interface
>>            (commit 5bdfa854013ce4193de0d097931fd841382c76a7 upstream)
>>
>> Thanks,
>>
>>                    Ilya
>>

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux