Re: No monitor sockets after upgrading to Emperor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I just restarted an OSD node and none of the admin sockets showed up on reboot (though it joined the cluster fine and all OSDs are happy. The node is a Ubuntu 12.04.3 system originally deployed via ceph-deploy on dumpling.

The only thing that stands out to me is the failure on lock_fsid and the error converting store message.

Here are the snip from OSD 19 of a full reboot starting with the shutdown complete entry, and going until all the reconnect messages.

2013-11-12 09:44:00.757576 7fb8a8e24780  1 -- 192.168.200.54:6819/23261 shutdown complete.
2013-11-12 09:47:05.843425 7f7918e9d780  0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 1734
2013-11-12 09:47:05.892704 7f7918e9d780  1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:05.892718 7f7918e9d780  1 filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:47:05.944312 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2013-11-12 09:47:05.944327 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:05.944743 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:06.258005 7f7918e9d780  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2013-11-12 09:47:07.567405 7f7918e9d780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570098 7f7918e9d780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 19: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.570352 7f7918e9d780  1 journal close /var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:47:07.571215 7f7918e9d780  1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:47:07.572742 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2013-11-12 09:47:07.572750 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:47:07.573234 7f7918e9d780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:47:07.574879 7f7918e9d780  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2013-11-12 09:47:07.577043 7f7918e9d780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.578649 7f7918e9d780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:47:07.680531 7f7918e9d780  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2013-11-12 09:47:09.670813 7f8151b5f780  0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 2769
2013-11-12 09:47:09.673789 7f8151b5f780  0 filestore(/var/lib/ceph/osd/ceph-19) lock_fsid failed to lock /var/lib/ceph/osd/ceph-19/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable
2013-11-12 09:47:09.673804 7f8151b5f780 -1 filestore(/var/lib/ceph/osd/ceph-19) FileStore::mount: lock_fsid failed
2013-11-12 09:47:09.673919 7f8151b5f780 -1  ** ERROR: error converting store /var/lib/ceph/osd/ceph-19: (16) Device or resource busy
2013-11-12 09:47:14.169305 7f78fd548700  0 -- 10.200.1.54:6802/1734 >> 10.200.1.51:6800/13263 pipe(0x1e48c80 sd=42 :55275 s=2 pgs=5530 cs=1 l=0 c=0x1eae2c0).fault, initiating reconnect
2013-11-12 09:47:14.169444 7f78fd346700  0 -- 10.200.1.54:6802/1734 >> 10.200.1.57:6804/8226 pipe(0xc1ed500 sd=43 :47978 s=2 pgs=16845 cs=1 l=0 c=0x1eae840).fault, initiating reconnect
2013-11-12 09:47:14.169988 7f78fd144700  0 -- 10.200.1.54:6802/1734 >> 10.200.1.59:6810/4862 pipe(0xc1ed280 sd=46 :37094 s=2 pgs=42297 cs=1 l=0 c=0x1eae6e0).fault, initiating reconnect


And here is roughly the same snip from just doing a 'sudo restart ceph-osd-all':

2013-11-12 09:56:36.658014 7f7918e9d780  1 -- 192.168.200.54:6811/1734 shutdown complete.
2013-11-12 09:56:37.556988 7f3793c21780  0 ceph version 0.72 (5832e2603c7db5d40b433d0953408993a9b7c217), process ceph-osd, pid 13723
2013-11-12 09:56:37.559314 7f3793c21780  1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:56:37.559319 7f3793c21780  1 filestore(/var/lib/ceph/osd/ceph-19)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2013-11-12 09:56:37.561350 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2013-11-12 09:56:37.561360 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:56:37.562357 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:56:37.571030 7f3793c21780  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2013-11-12 09:56:37.574273 7f3793c21780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:56:37.578189 7f3793c21780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 23: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:56:37.578854 7f3793c21780  1 journal close /var/lib/ceph/osd/ceph-19/journal
2013-11-12 09:56:37.579638 7f3793c21780  1 filestore(/var/lib/ceph/osd/ceph-19) mount detected xfs
2013-11-12 09:56:37.581110 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2013-11-12 09:56:37.581118 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-11-12 09:56:37.582014 7f3793c21780  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2013-11-12 09:56:37.583365 7f3793c21780  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2013-11-12 09:56:37.585765 7f3793c21780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:56:37.588281 7f3793c21780  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 24: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1
2013-11-12 09:56:37.589782 7f3793c21780  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2013-11-12 09:56:39.723134 7f377488b700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.56:6806/563 pipe(0xc87ca00 sd=155 :38290 s=1 pgs=17864 cs=2 l=0 c=0xc893160).fault
2013-11-12 09:56:39.728798 7f3775194700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.52:6808/14464 pipe(0xc811000 sd=52 :51030 s=1 pgs=7473 cs=6 l=0 c=0xc7fbb00).fault
2013-11-12 09:56:39.807114 7f37787ca700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.52:6805/14449 pipe(0xc756280 sd=72 :46552 s=1 pgs=10912 cs=96 l=0 c=0xc740420).fault
2013-11-12 09:56:39.852465 7f3778ccf700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.57:6804/8226 pipe(0x2427780 sd=83 :48234 s=1 pgs=17251 cs=128 l=0 c=0x2406dc0).fault
2013-11-12 09:56:39.898327 7f377488b700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.56:6806/563 pipe(0xc87ca00 sd=42 :40942 s=1 pgs=17945 cs=164 l=0 c=0xc893160).fault
2013-11-12 09:56:40.738437 7f3775ea1700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xc7c2500 sd=72 :40289 s=2 pgs=33225 cs=109 l=0 c=0xc7fb840).fault with nothing to send, going to standby
2013-11-12 09:56:40.740185 7f376b2fd700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0 c=0xc79d000).accept connect_seq 0 vs existing 109 state standby
2013-11-12 09:56:40.740201 7f376b2fd700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.60:6810/32089 pipe(0xcd66a00 sd=279 :6807 s=0 pgs=0 cs=0 l=0 c=0xc79d000).accept peer reset, then tried to connect to us, replacing
2013-11-12 09:56:41.639911 7f376fd47700  0 -- 192.168.200.54:6806/13723 >> 192.168.48.127:0/234188561 pipe(0xcf87a00 sd=127 :6806 s=0 pgs=0 cs=0 l=0 c=0xcb80580).accept peer addr is really 192.168.48.127:0/234188561 (socket is 192.168.48.127:60893/0)
2013-11-12 09:56:44.394952 7f37657a3700  0 -- 10.200.1.54:6807/13723 >> 10.200.1.54:6810/13792 pipe(0xcee7c80 sd=160 :6807 s=0 pgs=0 cs=0 l=0 c=0xd0d7160).accept connect_seq 0 vs existing 0 state connecting
2013-11-12 09:56:59.334100 7f3764396700  0 -- 192.168.200.54:6806/13723 >> 192.168.48.102:0/663636012 pipe(0xdbb9280 sd=197 :6806 s=0 pgs=0 cs=0 l=0 c=0xdbbc000).accept peer addr is really 192.168.48.102:0/663636012 (socket is 192.168.48.102:35496/0)
2013-11-12 09:57:45.805456 7f3764194700  0 -- 192.168.200.54:6806/13723 >> 192.168.48.103:0/1090276439 pipe(0xdbb9000 sd=180 :6806 s=0 pgs=0 cs=0 l=0 c=0xce83dc0).accept peer addr is really 192.168.48.103:0/1090276439 (socket is 192.168.48.103:41220/0)

After the 'restart ceph-osd-all' the admin sockets for all 4 OSDs on this host are present.

Let me know if there is additional logging or assistance I can provide to narrow it down.

Thanks,
Berant



On Tue, Nov 12, 2013 at 4:03 AM, Joao Luis <joao.luis@xxxxxxxxxxx> wrote:


On Nov 12, 2013 2:38 AM, "Berant Lemmenes" <berant@xxxxxxxxxxxx> wrote:
>
> I noticed the same behavior on my dumpling cluster. They wouldn't show up after boot, but after a service restart they were there.
>
> I haven't tested a node reboot since I upgraded to emperor today. I'll give it a shot tomorrow.
>
> Thanks,
> Berant
>
> On Nov 11, 2013 9:29 PM, "Peter Matulis" <peter.matulis@xxxxxxxxxxxxx> wrote:
>>
>> After upgrading from Dumpling to Emperor on Ubuntu 12.04 I noticed the
>> admin sockets for each of my monitors were missing although the cluster
>> seemed to continue running fine.  There wasn't anything under
>> /var/run/ceph.  After restarting the service on each monitor node they
>> reappeared.  Anyone?
>>
>> ~pmatulis
>>

Odd behavior. The monitors do remove the admin socket on shutdown and proceed to create it when they start, but as long as they are running it should exist. Have you checked the logs for some error message that could provide more insight on the cause?

  -Joao


_______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux