Re: Apparent cephfs bugs

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 5 May 2022 06:56:12 -0700

On Tue, May 3, 2022 at 1:34 PM Vladimir Brik
<vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
> Is this mailing list an appropriate place to report apparent
> cephfs bugs? I haven't gotten traction over on ceph-users.
>
> A couple of weeks ago we attempted to switch from Lustre to
> Cephfs for our compute cluster shared file system but had to
> roll back because users began reporting problems:
>
> 1) Some writes failing silently, resulting in 0-size files.

The only thing like this I've seen is if the relevant client doesn't
actually have cephx caps allowing them to write to the configured data
pool/namespace, but they do have permissions to work in that directory
on the MDS. This should generally get detected and warned about on
modern clients, but you're on some pretty old kernel clients...

> 2) Some writes hanging indefinitely. In my experiments first
> 4MB (4194304B) would be written out fine, but then the
> process would get stuck.

I don't think this has been seen before. That 4MiB means it's an
object boundary and so probably something odd is happening there, but
it's not familiar. Xibuo's been working with you and is the dev you
want, so keep trying to debug with him I guess. :/

>
> I've been generally unable to trigger these bugs, except (2)
> which seems to affect only some systems, but can be
> reproduced every time on an impacted system, at least for a
> while.
>
> We have close to 200 kernel cephfs clients (trying fuse
> mounts resulted in hangs). They mostly run kernels between
> 3.10.0-957.27.2.el7 and 3.10.0-1160.62.1.el7. A few machines
> have  4.18.0-348.20.1.el8_5.
>
> The cluster is running 16.2.7, consists of 20 OSD servers
> with 24-26 disks. Cephfs metadata pool is stored across 12
> OSDs backed by NVMe flash on 3 servers. Single MDS daemon.
>
> Do the problems we've experienced sound like any known bugs?
>
> The MDS was complaining about slow IO when users were
> experiencing issues. Could this explain empty files?

It shouldn't -- slow IO just means cap changes will continue to be slow.
Unless you've done something like turn on the transparent reconnect
after clients get blocklisted, and things are going so slowly that's
happening and they're throwing away buffered IO without returning
errors to the callers.
-Greg

>
>
> Vlad
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx