Hi Roose,
I think this should be similar with
https://tracker.ceph.com/issues/57898, but it's from 5.15 instead.
Days ago just after Ilya rebased to the 6.1 without changing anything in
ceph code the xfstest tests were much faster.
I just checked the difference about the 5.4 and 5.4.45 and couldn't know
what has happened exactly. So please share your test case about this.
- Xiubo
On 15/12/2022 23:32, Ilya Dryomov wrote:
On Thu, Dec 15, 2022 at 3:22 PM Roose, Marco <marco.roose@xxxxxxxxxxxxx> wrote:
Dear Ilya,
I'm using Ubuntu and a CephFS mount. I had a more than 90% write performance decrease after changing the kernels main version ( <10MB/s vs. 100-140 MB/s). The problem seems to exist in Kernel major versions starting at v5.4. Ubuntu mainline version v5.4.25 is fine, v5.4.45 (which is next available) is "bad".
Hi Marco,
What is the workload?
After a git bisect with the "original" 5.4 kernels I get the following result:
Can you describe how you performed bisection? Can you share the
reproducer you used for bisection?
ed24820d1b0cbe8154c04189a44e363230ed647e is the first bad commit
commit ed24820d1b0cbe8154c04189a44e363230ed647e
Author: Ilya Dryomov <idryomov@xxxxxxxxx>
Date: Mon Mar 9 12:03:14 2020 +0100
ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
commit 7614209736fbc4927584d4387faade4f31444fce upstream.
CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well. Unfortunately the backwards compatibility here
is lacking:
- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
was guarded by require_osd_release >= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
no difference to clients that only check OSDMAP_FULL/NEARFULL because
require_osd_release is not client-facing -- it is for OSDs
Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.
These checks are best effort, so take osdc->lock and look up pool flags
just once. Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.
The only suspicious thing I see in this commit is osdc->lock semaphore
which is taken for read for a short period of time in ceph_write_iter().
It's possible that that started interfering with other code paths that
take that semaphore for write and read-write lock fairness algorithm is
biting...
Can you confirm the result by manually checking out the previous commit
and verifying that it's "good"?
commit 44960e1c39d807cd0023dc7036ee37f105617ebe
RDMA/mad: Do not crash if the rdma device does not have a umad interface
(commit 5bdfa854013ce4193de0d097931fd841382c76a7 upstream)
Thanks,
Ilya