Re: How's cephfs going?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All three mons has value "simple".

21 июля 2017 г., в 15:47, Ilya Dryomov <idryomov@xxxxxxxxx> написал(а):

On Thu, Jul 20, 2017 at 6:35 PM, Дмитрий Глушенок <glush@xxxxxxxxxx> wrote:
Hi Ilya,

While trying to reproduce the issue I've found that:
- it is relatively easy to reproduce 5-6 minutes hangs just by killing
active mds process (triggering failover) while writing a lot of data.
Unacceptable timeout, but not the case of
http://tracker.ceph.com/issues/15255
- it is hard to reproduce the endless hang (I've spent an hour without
success)

One thing I've noticed analysing logs is that "endless hang" always was
accompanied with following messages:

Jul 20 15:31:57 mn-ceph-nfs-gw-01 kernel: libceph: mon0 10.50.67.25:6789
session lost, hunting for new mon
Jul 20 15:31:57 mn-ceph-nfs-gw-01 kernel: libceph: mon1 10.50.67.26:6789
session established
Jul 20 15:32:27 mn-ceph-nfs-gw-01 kernel: libceph: mon1 10.50.67.26:6789
session lost, hunting for new mon
Jul 20 15:32:27 mn-ceph-nfs-gw-01 kernel: libceph: mon2 10.50.67.27:6789
session established
Jul 20 15:32:57 mn-ceph-nfs-gw-01 kernel: libceph: mon2 10.50.67.27:6789
session lost, hunting for new mon
Jul 20 15:32:57 mn-ceph-nfs-gw-01 kernel: libceph: mon0 10.50.67.25:6789
session established
Jul 20 15:33:28 mn-ceph-nfs-gw-01 kernel: libceph: mon0 10.50.67.25:6789
session lost, hunting for new mon
Jul 20 15:33:28 mn-ceph-nfs-gw-01 kernel: libceph: mon2 10.50.67.27:6789
session established
Jul 20 15:33:58 mn-ceph-nfs-gw-01 kernel: libceph: mon2 10.50.67.27:6789
session lost, hunting for new mon
Jul 20 15:34:29 mn-ceph-nfs-gw-01 kernel: libceph: mon2 10.50.67.27:6789
session established


Bug http://tracker.ceph.com/issues/17664 describes such behaviour and it was
fixed in releases starting with v11.1.0 (I'm using 10.2.7). So, the lost
session somehow triggers client disconnection and fencing (as described at
http://docs.ceph.com/docs/master/cephfs/troubleshooting/#disconnected-remounted-fs).

Do you still think it should be posted to
http://tracker.ceph.com/issues/15255 ?

Are you using async messenger?  You can check with

$ ceph daemon mon.X config get ms_type

for all MONs.

Thanks,

               Ilya

--
Дмитрий Глушенок
Инфосистемы Джет
+7-910-453-2568

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux