On 12/10/2014 17:48, Gregory Farnum wrote: > On Sun, Oct 12, 2014 at 7:46 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >> Hi, >> >> On a 0.80.6 cluster the command >> >> ceph tell osd.6 version >> >> hangs forever. I checked that it establishes a TCP connection to the OSD, raised the OSD debug level to 20 and I do not see >> >> https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L4991 >> >> in the logs. All other OSDs answer to the same "version" command as they should. And ceph daemon osd.6 version on the machine running OSD 6 responds as it should. There also are an ever growing number of slow requests on this OSD. But not error in the logs. In other words, except for taking forever to answer any kind of request the OSD looks fine. >> >> Another OSD running on the same machine is behaving well. >> >> Any idea what that behaviour relates to ? > > What commands have you run? The admin socket commands don't require > nearly as many locks, nor do they go through the same event loops that > messages do. You might have found a deadlock or something. (In which > case just restarting the OSD would probably fix it, but you should > grab a core dump first.) # /etc/init.d/ceph stop osd.6 === osd.6 === Stopping Ceph osd.6 on g3...kill 23690...kill 23690...done root@g3:/var/lib/ceph/osd/ceph-6/current# /etc/init.d/ceph start osd.6 === osd.6 === Starting Ceph osd.6 on g3... starting osd.6 at :/0 osd_data /var/lib/ceph/osd/ceph-6 /var/lib/ceph/osd/ceph-6/journal root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version { "version": "ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)"} root@g3:/var/lib/ceph/osd/ceph-6/current# ceph tell osd.6 version and now it blocks. It looks like a deadlock happens shortly after it boots. -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com