Re: ceph-mgr: requests to restful api get blocked sometimes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 31 Oct 2018 at 19:52, Boris Ranto <branto@xxxxxxxxxx> wrote:
>
> On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
> >
> > Hi,
> >
>
> Hi,
>
> > We setup a ceph cluster (v12.2.2) with restful api plugin running, but
> > sometimes requests got blocked forever without responding.  While
> > stucking in such condition, we checked the netstat output and it shown
> > some packets were queued in the Recv-Q:
> >
> > [~] netstat -tupln
> > Active Internet connections (only servers)
> > Proto Recv-Q Send-Q Local Address           Foreign Address
> > State       PID/Program name
> > tcp      129      0 192.168.2.1:8003        0.0.0.0:*
> > LISTEN      1885/ceph-mgr
> >
> > And a log which may be related to the issue is captured:
> > 2018-10-29 13:43:00.058319 7fcd1891b700  1 mgr[restful] Unknown
> > request '140518797573648:0'
> >
> > After digging into the codes, should the requests list be protected by
> > the requests_lock as the following patch?  A possible condition we
> > suspect is that a request is done and the restful plugin is notified.
> > But unfortunately, the request is not appended to the requests list
> > yet which makes a "Unknown request" log is generated and the
> > submit_reqeust() function waits forever without acceping new request.
> >
> > diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py
> > index 6ce610b..bbe88ab 100644
> > --- a/src/pybind/mgr/restful/module.py
> > +++ b/src/pybind/mgr/restful/module.py
> > @@ -363,9 +363,10 @@ class Module(MgrModule):
> >              if tag == 'seq':
> >                  return
> >
> > -            request = filter(
> > -                lambda x: x.is_running(tag),
> > -                self.requests)
> > +            with self.requests_lock:
> > +                request = filter(
> > +                        lambda x: x.is_running(tag),
> > +                        self.requests)
> >
> >              if len(request) != 1:
> >                  self.log.warn("Unknown request '%s'" % str(tag))
> > @@ -596,8 +597,8 @@ class Module(MgrModule):
> >
> >
> >      def submit_request(self, _request, **kwargs):
> > -        request = CommandsRequest(_request)
> >          with self.requests_lock:
> > +            request = CommandsRequest(_request)
> >              self.requests.append(request)
> >          if kwargs.get('wait', 0):
> >              while not request.is_finished():
> >
> >
> > Any idea and feedback are appreciated, thanks.
> >
>
> Do you pass ?wait=1 to the URL so that submit_request actually waits
> for the request to finish? Otherwise, the call should be non-blocking.
>

Yes.

> In any case, I suspect you might be right. The request
> (CommandsRequest) creation can fire up the notify event early and this
> can cause a race condition where the actual request was not yet added
> to the self.requests list so it won't be recognized in the notify
> function. The patch looks good to me. Just note that the notify
> function was modified slightly in current master so that it does not
> print the 'Unknown request' warnings.
>
> Have you been able to successfully test your patch? Can you create a PR?

We basically tested the patch already and keep monitoring the cluster
status for several days to make sure whether the same issue occurs or
not.  We can create a PR for this issue.  Thanks for the feedback :)

- Jerry

>
> -boris
>
> > - Jerry



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux