On Thu, 1 Mar 2018, Haomai Wang wrote: > On Thu, Mar 1, 2018 at 11:52 AM, Haomai Wang <haomai@xxxxxxxx> wrote: > > On Thu, Mar 1, 2018 at 10:59 AM, Sage Weil <sweil@xxxxxxxxxx> wrote: > >> > >> Hi Haomai, > >> > >> I'm having trouble fixing an issue with AsyncMessenger vs fork with the > >> new wip-config branch. I'm wondering if you can help? > >> > >> The thing that has changed is that global_init now sets up a messenger > >> instance to talk to the monitor, fetch the config and monmap, and then > >> tears it down again. A side-effect of this is that the singletons for the > >> EventCenter and Stack in AsyncMessenger are present at the time of the > >> fork(). And it's not quite behaving as it should. The symptom I'm seeing > >> is that rbd-nbd is hanging strangely (and somewhat non-deterministically) > >> when trying to map an image. > > I guess what you mean is > this(https://github.com/ceph/ceph-ci/blob/67d46048eb5a6e8fd208ee4e21e208087b3a3969/src/mon/MonClient.cc#L114)? Right. The key difference between this and the ceph_osd.cc case is that this messenger is torn down (aside from the singletons) before the fork, whereas in OSD the messenger sticks around. I think the postfork_start call is in the right place in the updated rbd-nbd.cc code: if (global_init_prefork(g_ceph_context) >= 0) { std::string err; r = forker.prefork(err); if (r < 0) { cerr << err << std::endl; return r; } global_init_postfork_start(g_ceph_context); if (forker.is_parent()) { if (forker.parent_wait(err) != 0) { return -ENXIO; } return 0; } } common_init_finish(g_ceph_context); global_init_chdir(g_ceph_context); We stop the threads in global_init_prefork(), the actual fork happens in prefork(), and we start them up again in global_init_postfork_start(). (global_init_postfork_finish() later on closes stdout and stderr right when we do the preforker.daemonize() that detaches from the parent. sage > > >> > >> wip-sage3-testing-2018-02-28-1744 is the branch I'm working against > >> currently. It includes wip-config, and fix rbd-nbd to adjust the timing > >> of the prefork calls, and a patch on top that allows the cct->_log::stop() > >> to be called if it hasn't been started. > >> > >> The way things work in this branch, global_init does the temp connection > >> and tears everything down, except that the singleton Stack and EventCenter > >> are still there. At fork time, the fork hooks get called for > >> AsyncMessenger so it does the stop() before and start() after to recreate > >> the threads. That seems to work.. except I think something is not > >> quite right because it doesn't reliably talk to the cluster afterwards. I > >> see exchanges like > > > > the adjust with global_init_postfork_start isn't right, the daemonize > > will terminate all existing threads, we must respawn all threads after > > fork. so the later thing happen as expected > > > >> > >> gnit:build (wip-nbd-hell) 05:03 PM $ sudo CEPH_ARGS="--debug-ms 1 --debug-rbd 10 --log-to-stderr" bin/rbd-nbd map barr > >> [sudo] password for sage: > >> 2018-02-28 17:03:59.293 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled. > >> 2018-02-28 17:03:59.296 7f06a6b7c880 1 Processor -- start > >> 2018-02-28 17:03:59.296 7f06a6b7c880 1 -- - start start > >> 2018-02-28 17:03:59.297 7f06a6b7c880 1 -- - --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 0) v1 -- 0x56089092b550 con 0 > >> 2018-02-28 17:03:59.297 7f0693fff700 1 -- 127.0.0.1:0/1758722310 learned_addr learned my addr 127.0.0.1:0/1758722310 > >> 2018-02-28 17:03:59.298 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 1 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0684002630 con 0x56089093e730 > >> 2018-02-28 17:03:59.298 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3411293466 0 0) 0x7f06840029b0 con 0x56089093e730 > >> 2018-02-28 17:03:59.298 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 --> 127.0.0.1:40023/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7f06800017c0 con 0 > >> 2018-02-28 17:03:59.299 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (711149518 0 0) 0x7f0684002fb0 con 0x56089093e730 > >> 2018-02-28 17:03:59.299 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 --> 127.0.0.1:40023/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x56089079d2f0 con 0 > >> 2018-02-28 17:03:59.300 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 4 ==== config(3 keys) v1 ==== 100+0+0 (3814196429 0 0) 0x5608908cd1e0 con 0x56089093e730 > >> 2018-02-28 17:03:59.300 7f0692ffd700 1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 5 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0684001390 con 0x56089093e730 > >> 2018-02-28 17:03:59.300 7f06a6b7c880 1 -- 127.0.0.1:0/1758722310 >> 127.0.0.1:40023/0 conn(0x56089093e730 :-1 s=STATE_OPEN pgs=62 cs=1 l=1).mark_down > >> 2018-02-28 17:03:59.300 7f06a6b7c880 1 -- 127.0.0.1:0/1758722310 shutdown_connections > >> 2018-02-28 17:03:59.301 7f06a6b7c880 1 -- 127.0.0.1:0/1758722310 shutdown_connections > >> 2018-02-28 17:03:59.301 7f06a6b7c880 1 -- 127.0.0.1:0/1758722310 wait complete. > >> 2018-02-28 17:03:59.301 7f06a6b7c880 1 -- 127.0.0.1:0/1758722310 >> 127.0.0.1:0/1758722310 conn(0x560890927300 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down > >> > >> (^ the initial pre-fork exchange to get the config) > >> > >> 2018-02-28 17:03:59.302 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled. > >> 2018-02-28 17:03:59.302 7f06a6b7c880 0 ceph version 13.0.1-2356-g3f9027014b (3f9027014ba1fa805ab596155d6f4b599ab00806) mimic (dev), process (unknown), pid 15464 > >> 2018-02-28 17:03:59.303 7f06a6b7c880 0 pidfile_write: ignore empty --pid-file > >> 2018-02-28 17:03:59.303 7f06a6b7c880 0 pidfile_write: ignore empty --pid-file > >> 2018-02-28 17:03:59.307 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled. > >> 2018-02-28 17:03:59.309 7f06a6b7c880 1 Processor -- start > >> 2018-02-28 17:03:59.309 7f06a6b7c880 1 -- - start start > >> 2018-02-28 17:03:59.310 7f06a6b7c880 1 -- - --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 0) v1 -- 0x560890868100 con 0 > >> 2018-02-28 17:03:59.310 7f0693fff700 1 -- 127.0.0.1:0/4125566264 learned_addr learned my addr 127.0.0.1:0/4125566264 > >> 2018-02-28 17:03:59.311 7f0691ffb700 1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 1 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0678002490 con 0x560890868670 > >> 2018-02-28 17:03:59.311 7f0691ffb700 1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (2983701339 0 0) 0x7f0678002810 con 0x560890868670 > >> 2018-02-28 17:03:59.311 7f0691ffb700 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7f068c001980 con 0 > >> 2018-02-28 17:03:59.312 7f0691ffb700 1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3905746370 0 0) 0x7f0678002d40 con 0x560890868670 > >> 2018-02-28 17:03:59.312 7f0691ffb700 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x5608907f6750 con 0 > >> 2018-02-28 17:04:02.313 7f06a6b7c880 -1 monclient: get_monmap_and_config failed to get config > >> 2018-02-28 17:04:02.313 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x560890868670 :-1 s=STATE_OPEN pgs=63 cs=1 l=1).mark_down > >> 2018-02-28 17:04:02.314 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.314 7f06a6b7c880 0 monclient(hunting): authenticate timed out after 300 > >> 2018-02-28 17:09:02.314 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.315 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.315 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.316 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.316 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.317 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.317 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.318 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.318 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.319 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.319 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.320 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.320 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.321 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.321 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.322 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0 > >> 2018-02-28 17:09:02.322 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down > >> 2018-02-28 17:09:02.322 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 shutdown_connections > >> 2018-02-28 17:09:29.499 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 shutdown_connections > >> 2018-02-28 17:09:29.499 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 wait complete. > >> 2018-02-28 17:09:29.499 7f06a6b7c880 1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:0/4125566264 conn(0x560890863d80 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down > >> rbd-nbd: failed to map, status: (110) Connection timed out > >> > >> Notice we get a few messages back but the all of them; the mon > >> log indicates it never got the subscribe message. Other times I run it I > >> get *no* successful interaction with the mon. > >> > >> My guess would be some stop/start issue with the singletons. What > >> confuses me is that this behavior similarish to what the OSD is doing > >> (creating a messenger before the fork, forking, and then using it) but it > >> doesn't seem to have any problems. > >> > >> Any ideas? It's easy to reproduce with that branch and > >> > >> bin/init-ceph stop ; MON=1 OSD=1 MDS=0 ../src/vstart.sh -d -n -x -l > >> bin/ceph osd pool create rbd 1 > >> sudo CEPH_ARGS="--debug-ms 1 --log-to-stderr" bin/rbd-nbd map doesnotexist > >> > >> sage > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html