Is this mailing list an appropriate place to report apparent
cephfs bugs? I haven't gotten traction over on ceph-users.
A couple of weeks ago we attempted to switch from Lustre to
Cephfs for our compute cluster shared file system but had to
roll back because users began reporting problems:
1) Some writes failing silently, resulting in 0-size files.
2) Some writes hanging indefinitely. In my experiments first
4MB (4194304B) would be written out fine, but then the
process would get stuck.
I've been generally unable to trigger these bugs, except (2)
which seems to affect only some systems, but can be
reproduced every time on an impacted system, at least for a
while.
We have close to 200 kernel cephfs clients (trying fuse
mounts resulted in hangs). They mostly run kernels between
3.10.0-957.27.2.el7 and 3.10.0-1160.62.1.el7. A few machines
have 4.18.0-348.20.1.el8_5.
The cluster is running 16.2.7, consists of 20 OSD servers
with 24-26 disks. Cephfs metadata pool is stored across 12
OSDs backed by NVMe flash on 3 servers. Single MDS daemon.
Do the problems we've experienced sound like any known bugs?
The MDS was complaining about slow IO when users were
experiencing issues. Could this explain empty files?
Vlad
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx