Re: Async Messenger RDMA IB ib_uverbs_write return EACCES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Penyaev,
   Below code shows where fork is called(ceph commit head: 878e488be3)
   File: src/ceph_osd.cc
        1 +--105 lines: -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*- ---
      106 int main(int argc, const char **argv)
      107 { 
      108 +-- 16 lines: vector<const char*> args;----------------------------------------------
      124   auto cct = global_init(      // call global_pre_init, then create call mc_bootstrap.get_monmap_and_config(). It'll create public messenger object
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      125     &defaults,
      126     args, CEPH_ENTITY_TYPE_OSD,
      127     CODE_ENVIRONMENT_DAEMON,
      128     0, "osd_data");
      129 +-- 65 lines: ceph_heap_profiler_init();---------------------------------------------
      194     int r = forker.prefork(err);
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // fork child process
      195     if (r < 0) {
      196       cerr << err << std::endl;
      197       return r;
      198     }     
      199     if (forker.is_parent()) { //parent wait for child process to exit
      200 +--  9 lines: g_ceph_context->_log->start();-----------------------------------------
      209 +--326 lines: common_init_finish(g_ceph_context);------------------------------------
      535   Messenger *ms_public = Messenger::create(g_ceph_context, public_msg_type,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^//in child process, it'll create the messenger object and query rdma device's attribute again.
      536                                            entity_name_t::OSD(whoami), "client",
      537                                            getpid(),
      538                                            Messenger::HAS_HEAVY_TRAFFIC |


   > That would be a perfect way, but I could not find an easy way to
   > destroy Infiniband singleton object (I did some experiments and
   > it turned out not so easy, e.g. if you can't deregister memory
   > regions in child process or after setuid()).
   [Changcheng]: In the desturction function DeviceList::~DeviceList(), we need a function to match with "rdma_get_devices". Or, the child process will still use the same fd opened by parent process and use the child process's crendential to operate the device(fd). The ib_uverbs.ko driver doesn't allow this kind of operation.

B.R.
Changcheng

On 10:36 Thu 02 May, Roman Penyaev wrote:
> On 2019-05-02 03:45, Liu, Changcheng wrote:
> > Hi Penyaev,
> >     Could you give more info about below point? I don't understand it
> > quiet well.
> >      > My question is why you daemonize your ceph services and do not
> > rely on systemd,
> >      > which does fork() on its own and runs each service with '-f'
> > flag, which means
> >      > do not daemonize?  So I would not daemonize services and this
> > can be a simple solution.
> 
> You provided the test, which reproduces the problem with -EACCESS,
> where you explicitly call fork().  According to my code understanding
> ceph services do fork() on early start only in daemon() glibc call
> (which internally does fork()).  If you use systemd daemonization
> is not used, so fork() is not called, so I am a bit confused: what
> exact places in the code you know, where fork() is called?
> 
> >     Thanks for your patch. I'll verify it when I'm back to office.
> >     Is it possible that rdma_cm library supply one API, e.g.
> > rdma_put_devices(), to close the devices in proper status?
> >     Then, the child process could re-open the device with
> > rdma_get_devices and query the device's attribute succeed.
> 
> That would be a perfect way, but I could not find an easy way to
> destroy Infiniband singleton object (I did some experiments and
> it turned out not so easy, e.g. if you can't deregister memory
> regions in child process or after setuid()).
> 
> --
> Roman
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux