Re: avoid 3-mds fs laggy on 1 rejoin?

"Yan, Zheng" <ukernel@xxxxxxxxx> · Sat, 10 Oct 2015 10:25:48 +0800

On Fri, Oct 9, 2015 at 7:26 PM, Dzianis Kahanovich
<mahatma@xxxxxxxxxxxxxx> wrote:
> Yan, Zheng пишет:
>
>>>> It seems you have 16 mounts. Are you using kernel client or fuse
>>>> client, which version are they?
>>>
>>>
>>>
>>> 1) On every of 4x ceph node:
>>> 4.1.8 kernel mount to /mnt/ceph0 (used only for samba/ctdb lockfile);
>>> fuse mount to /mnt/ceph1 (some used);
>>> samba cluster (ctdb) with vfs_ceph;
>>> 2) On 2 additional out-of-cluster (service) nodes:
>>> 4.1.8 (now 4.2.3) kernel mount;
>>> 4.1.0 both mounts;
>>> 3) 2 VMs:
>>> kernel mounts (most active: web & mail);
>>> 4.2.3;
>>>
>>> fuse mounts - same version with ceph;
>>
>>
>>
>> please run "ceph daemon mds.x session ls" to find which client has
>> largest number of caps. mds.x is ID of active mds.
>
>
> 1) This command is not valid more ;) - "cephfs-table-tool x show session"
> now.

you need to run this command on machine that runs the active MDS. this
command show some in-memory states of session. cephfs-table-tool only
show on-disk states of session.

>
> 2) I have 3 active mds now. I try, it works, keep it. Restart still
> problematic.
>

multiple active MDS is not ready for production.

> 3) Yes, more caps on master VM (4.2.3 kernel mount, there are
> web+mail+heartbeat cluster 2xVMs) on apache root. In this place was (no more
> now) described CLONE_FS -> CLONE_VFORK deadlocks. But 4.2.3 installed just
> before tests, was 4.1.8 with similar effects (but log from 4.2.3 on VM
> clients).

I suspect your problem is due to some mount have too many open files.
during mds failover, mds needs to open these files, which take a long
time.

Yan, Zheng

>
> Waith this night for MDSs restart.
>
>
> --
> WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com