On 08/06/2021 21:04, Ilya Dryomov wrote:
On Tue, Jun 8, 2021 at 7:11 PM Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,
So I've been doing some tests with v16.2.4 with a 2TB Samsung PM983 SSD
mounted under /mnt/rbd-cache
rbd_persistent_cache_mode = ssd
rbd_persistent_cache_size = 2G
rbd_persistent_cache_path = /mnt/rbd-cache
rbd_plugins = pwl_cache
I tried both XFS and EXT4 as the filesystem.
This however leads to fio or 'rbd bench' to crash:
root@infra-138-b16-27:~# fio fio/rbd_rw_1.fio
rbd_w_iodepth_1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
fio-3.1
Starting 1 process
Segmentation fault1)][13.3%][r=0KiB/s,w=14.7MiB/s][r=0,w=3768 IOPS][eta
00m:52s]
root@infra-138-b16-27:~#
(The IOps seem great!)
My fio test is fairly simple:
[global]
ioengine=rbd
clientname=admin
pool=rbd
rbdname=fio1
invalidate=0
bs=4k
runtime=60
direct=1
[rbd_w_iodepth_1]
rw=randwrite
iodepth=1
I have tried to trace it with gdb, but I didn't get further with my
backtrace then:
(gdb) bt
#0 ContextWQ::process (ctx=0x7fffb8081480, this=0x7fffb8012470) at
./src/common/WorkQueue.h:556
#1 ThreadPool::PointerWQ<Context>::_void_process (this=0x7fffb8012470,
item=0x7fffb8081480, handle=...) at ./src/common/WorkQueue.h:341
#2 0x00007fffec600912 in ThreadPool::worker (this=0x7fffb8012018,
wt=<optimized out>) at ./src/common/WorkQueue.cc:117
#3 0x00007fffec601801 in ThreadPool::WorkThread::entry (this=<optimized
out>) at ./src/common/WorkQueue.h:395
#4 0x00007ffff5c796db in start_thread (arg=0x7fffb17fa700) at
pthread_create.c:463
#5 0x00007ffff579e71f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Has anybody been able to use pwl_cache successfully?
Hi Wido,
Unfortunately "rbd_persistent_cache_mode = ssd" cache has shipped
rather broken. This particular crash is most likely already fixed
in master, but there are a few more outstanding. There is a dozen
of "[pwl ssd] ..." tickets in the rbd project, the fixes would be
backported to pacific once the ssd mode is stable enough.
Until then, I would to stick to "rbd_persistent_cache_mode = rwl"
or avoid the pwl_cache plugin entirely.
I tried with rwl and am now using XFS with DAX enabled and it works.
Performance-wise I see an improvement of 2x in terms of IOps with qd=1 bs=4k
My kernel is reporting that my PM983 4TB NVMe I am using as a backing
device is 100% util, but that seems off.
Can we expect any fixes for this cache in .5 or .6?
Wido
Thanks,
Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx