Re: asyncmessenger vs fork

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 1 Mar 2018, Haomai Wang wrote:
> On Thu, Mar 1, 2018 at 11:52 AM, Haomai Wang <haomai@xxxxxxxx> wrote:
> > On Thu, Mar 1, 2018 at 10:59 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> >>
> >> Hi Haomai,
> >>
> >> I'm having trouble fixing an issue with AsyncMessenger vs fork with the
> >> new wip-config branch.  I'm wondering if you can help?
> >>
> >> The thing that has changed is that global_init now sets up a messenger
> >> instance to talk to the monitor, fetch the config and monmap, and then
> >> tears it down again.  A side-effect of this is that the singletons for the
> >> EventCenter and Stack in AsyncMessenger are present at the time of the
> >> fork().  And it's not quite behaving as it should.  The symptom I'm seeing
> >> is that rbd-nbd is hanging strangely (and somewhat non-deterministically)
> >> when trying to map an image.
> 
> I guess what you mean is
> this(https://github.com/ceph/ceph-ci/blob/67d46048eb5a6e8fd208ee4e21e208087b3a3969/src/mon/MonClient.cc#L114)?

Right. The key difference between this and the ceph_osd.cc case is that 
this messenger is torn down (aside from the singletons) before the fork, 
whereas in OSD the messenger sticks around.

I think the postfork_start call is in the right place in the updated 
rbd-nbd.cc code:

  if (global_init_prefork(g_ceph_context) >= 0) {
    std::string err;
    r = forker.prefork(err);
    if (r < 0) {
      cerr << err << std::endl;
      return r;
    }
    global_init_postfork_start(g_ceph_context);
    if (forker.is_parent()) {
      if (forker.parent_wait(err) != 0) {
        return -ENXIO;
      }
      return 0;
    }
  }

  common_init_finish(g_ceph_context);
  global_init_chdir(g_ceph_context);

We stop the threads in global_init_prefork(), the actual fork happens in 
prefork(), and we start them up again in global_init_postfork_start().  
(global_init_postfork_finish() later on closes stdout and stderr right 
when we do the preforker.daemonize() that detaches from the parent.

sage


> 
> >>
> >> wip-sage3-testing-2018-02-28-1744 is the branch I'm working against
> >> currently.  It includes wip-config, and fix rbd-nbd to adjust the timing
> >> of the prefork calls, and a patch on top that allows the cct->_log::stop()
> >> to be called if it hasn't been started.
> >>
> >> The way things work in this branch, global_init does the temp connection
> >> and tears everything down, except that the singleton Stack and EventCenter
> >> are still there.  At fork time, the fork hooks get called for
> >> AsyncMessenger so it does the stop() before and start() after to recreate
> >> the threads.  That seems to work.. except I think something is not
> >> quite right because it doesn't reliably talk to the cluster afterwards.  I
> >> see exchanges like
> >
> > the adjust with global_init_postfork_start isn't right, the daemonize
> > will terminate all existing threads, we must respawn all threads after
> > fork. so the later thing happen as expected
> >
> >>
> >> gnit:build (wip-nbd-hell) 05:03 PM $ sudo CEPH_ARGS="--debug-ms 1 --debug-rbd 10 --log-to-stderr" bin/rbd-nbd map barr
> >> [sudo] password for sage:
> >> 2018-02-28 17:03:59.293 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled.
> >> 2018-02-28 17:03:59.296 7f06a6b7c880  1  Processor -- start
> >> 2018-02-28 17:03:59.296 7f06a6b7c880  1 -- - start start
> >> 2018-02-28 17:03:59.297 7f06a6b7c880  1 -- - --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 0) v1 -- 0x56089092b550 con 0
> >> 2018-02-28 17:03:59.297 7f0693fff700  1 -- 127.0.0.1:0/1758722310 learned_addr learned my addr 127.0.0.1:0/1758722310
> >> 2018-02-28 17:03:59.298 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 1 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0684002630 con 0x56089093e730
> >> 2018-02-28 17:03:59.298 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3411293466 0 0) 0x7f06840029b0 con 0x56089093e730
> >> 2018-02-28 17:03:59.298 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 --> 127.0.0.1:40023/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7f06800017c0 con 0
> >> 2018-02-28 17:03:59.299 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (711149518 0 0) 0x7f0684002fb0 con 0x56089093e730
> >> 2018-02-28 17:03:59.299 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 --> 127.0.0.1:40023/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x56089079d2f0 con 0
> >> 2018-02-28 17:03:59.300 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 4 ==== config(3 keys) v1 ==== 100+0+0 (3814196429 0 0) 0x5608908cd1e0 con 0x56089093e730
> >> 2018-02-28 17:03:59.300 7f0692ffd700  1 -- 127.0.0.1:0/1758722310 <== mon.0 127.0.0.1:40023/0 5 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0684001390 con 0x56089093e730
> >> 2018-02-28 17:03:59.300 7f06a6b7c880  1 -- 127.0.0.1:0/1758722310 >> 127.0.0.1:40023/0 conn(0x56089093e730 :-1 s=STATE_OPEN pgs=62 cs=1 l=1).mark_down
> >> 2018-02-28 17:03:59.300 7f06a6b7c880  1 -- 127.0.0.1:0/1758722310 shutdown_connections
> >> 2018-02-28 17:03:59.301 7f06a6b7c880  1 -- 127.0.0.1:0/1758722310 shutdown_connections
> >> 2018-02-28 17:03:59.301 7f06a6b7c880  1 -- 127.0.0.1:0/1758722310 wait complete.
> >> 2018-02-28 17:03:59.301 7f06a6b7c880  1 -- 127.0.0.1:0/1758722310 >> 127.0.0.1:0/1758722310 conn(0x560890927300 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
> >>
> >> (^ the initial pre-fork exchange to get the config)
> >>
> >> 2018-02-28 17:03:59.302 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled.
> >> 2018-02-28 17:03:59.302 7f06a6b7c880  0 ceph version 13.0.1-2356-g3f9027014b (3f9027014ba1fa805ab596155d6f4b599ab00806) mimic (dev), process (unknown), pid 15464
> >> 2018-02-28 17:03:59.303 7f06a6b7c880  0 pidfile_write: ignore empty --pid-file
> >> 2018-02-28 17:03:59.303 7f06a6b7c880  0 pidfile_write: ignore empty --pid-file
> >> 2018-02-28 17:03:59.307 7f06a6b7c880 -1 WARNING: all dangerous and experimental features are enabled.
> >> 2018-02-28 17:03:59.309 7f06a6b7c880  1  Processor -- start
> >> 2018-02-28 17:03:59.309 7f06a6b7c880  1 -- - start start
> >> 2018-02-28 17:03:59.310 7f06a6b7c880  1 -- - --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 0) v1 -- 0x560890868100 con 0
> >> 2018-02-28 17:03:59.310 7f0693fff700  1 -- 127.0.0.1:0/4125566264 learned_addr learned my addr 127.0.0.1:0/4125566264
> >> 2018-02-28 17:03:59.311 7f0691ffb700  1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 1 ==== mon_map magic: 0 v1 ==== 175+0+0 (698871189 0 0) 0x7f0678002490 con 0x560890868670
> >> 2018-02-28 17:03:59.311 7f0691ffb700  1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (2983701339 0 0) 0x7f0678002810 con 0x560890868670
> >> 2018-02-28 17:03:59.311 7f0691ffb700  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7f068c001980 con 0
> >> 2018-02-28 17:03:59.312 7f0691ffb700  1 -- 127.0.0.1:0/4125566264 <== mon.0 127.0.0.1:40023/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3905746370 0 0) 0x7f0678002d40 con 0x560890868670
> >> 2018-02-28 17:03:59.312 7f0691ffb700  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x5608907f6750 con 0
> >> 2018-02-28 17:04:02.313 7f06a6b7c880 -1 monclient: get_monmap_and_config failed to get config
> >> 2018-02-28 17:04:02.313 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x560890868670 :-1 s=STATE_OPEN pgs=63 cs=1 l=1).mark_down
> >> 2018-02-28 17:04:02.314 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.314 7f06a6b7c880  0 monclient(hunting): authenticate timed out after 300
> >> 2018-02-28 17:09:02.314 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.315 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.315 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.316 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.316 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.317 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.317 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.318 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.318 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.319 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.319 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.320 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.320 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.321 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.321 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.322 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 --> 127.0.0.1:40023/0 -- auth(proto 0 34 bytes epoch 1) v1 -- 0x56089086c1a0 con 0
> >> 2018-02-28 17:09:02.322 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:40023/0 conn(0x56089086c400 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=0).mark_down
> >> 2018-02-28 17:09:02.322 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 shutdown_connections
> >> 2018-02-28 17:09:29.499 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 shutdown_connections
> >> 2018-02-28 17:09:29.499 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 wait complete.
> >> 2018-02-28 17:09:29.499 7f06a6b7c880  1 -- 127.0.0.1:0/4125566264 >> 127.0.0.1:0/4125566264 conn(0x560890863d80 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
> >> rbd-nbd: failed to map, status: (110) Connection timed out
> >>
> >> Notice we get a few messages back but the all of them; the mon
> >> log indicates it never got the subscribe message.  Other times I run it I
> >> get *no* successful interaction with the mon.
> >>
> >> My guess would be some stop/start issue with the singletons.  What
> >> confuses me is that this behavior similarish to what the OSD is doing
> >> (creating a messenger before the fork, forking, and then using it) but it
> >> doesn't seem to have any problems.
> >>
> >> Any ideas?  It's easy to reproduce with that branch and
> >>
> >> bin/init-ceph stop ; MON=1 OSD=1 MDS=0 ../src/vstart.sh  -d -n -x -l
> >> bin/ceph osd pool create rbd 1
> >> sudo CEPH_ARGS="--debug-ms 1 --log-to-stderr" bin/rbd-nbd map doesnotexist
> >>
> >> sage
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux