CephFS: client hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Community,

 

we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially our cluster consisted of 3 nodes, each housing 1 OSD. Currently, we are in the process of remediating this. After a loss of metadata (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html) due to resetting the journal (journal entries were not being flushed fast enough), we managed to bring the cluster back up and started adding 2 additional nodes (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html) .

 

After adding the two additional nodes, we increased the number of placement groups to not only accomodate the new nodes, but also to prepare for reinstallation of the misconfigured nodes. Since then, the number of placement groups per OSD is too high of course. Despite this fact, cluster health remained fine over the last few months.

 

However, we are currently observing massive problems: Whenever we try to access any folder via CephFS, e.g. by listing its contents, there is no response. Clients are getting blacklisted, but there is no warning. ceph -s shows everything is ok, except for the number of PGs being too high. If I grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it is not possible to reduce the number of active MDS to 1. After issuing ‚ceph fs set fs_data max_mds 1‘ nothing happens.

 

Cluster details are available here: https://gitlab.uni-trier.de/snippets/77

 

The MDS log  (https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple) contains no „nicely exporting to“ messages as usual, but instead these:

2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/ [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260 10869=10202+667) hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw to mds.1

 

Updates from 12.2.8 to 12.2.11 I ran last week didn’t help.

 

Anybody got an idea or a hint where I could look into next? Any help would be greatly appreciated!

 

Kind regards

Christian Hennen

 

Project Manager Infrastructural Services
ZIMK University of Trier

Germany

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux