ceph-mgr hangs on larger clusters in Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After we upgraded from Jewel (10.2.10) to Luminous (12.2.5) we started seeing a problem where the new ceph-mgr would sometimes hang indefinitely when doing commands like 'ceph pg dump' on our largest cluster (~1,300 OSDs).  The rest of our clusters (10+) aren't seeing the same issue, but they are all under 600 OSDs each.  Restarting ceph-mgr seems to fix the issue for 12 hours or so, but usually overnight it'll get back into the state where the hang reappears.  At first I thought it was a hardware issue, but switching the primary ceph-mgr to another node didn't fix the problem.

 

I've increased the logging to 20/20 for debug_mgr, and while a working dump looks like this:

 

2018-10-18 09:26:16.256911 7f9dbf5e7700  4 mgr.server handle_command decoded 3

2018-10-18 09:26:16.256917 7f9dbf5e7700  4 mgr.server handle_command prefix=pg dump

2018-10-18 09:26:16.256937 7f9dbf5e7700 10 mgr.server _allowed_command  client.admin capable

2018-10-18 09:26:16.256951 7f9dbf5e7700  0 log_channel(audit) log [DBG] : from='client.1414554763 10.2.4.2:0/2175076978' entity='client.admin' cmd=[{"prefix": "pg dump", "target": ["mgr", ""], "format": "json-pretty"}]: dispatch

2018-10-18 09:26:22.567583 7f9dbf5e7700  1 mgr.server reply handle_command (0) Success dumped all

 

A failed dump call doesn't show up at all.  The "mgr.server handle_command prefix=pg dump" log entry doesn't seem to even make it to the logs.

 

This problem also continued to appear after upgrading to 12.2.8.

 

Has anyone else seen this?

 

Thanks,

Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux