On Wed, 31 Oct 2018 at 19:52, Boris Ranto <branto@xxxxxxxxxx> wrote: > > On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > > > Hi, > > > > Hi, > > > We setup a ceph cluster (v12.2.2) with restful api plugin running, but > > sometimes requests got blocked forever without responding. While > > stucking in such condition, we checked the netstat output and it shown > > some packets were queued in the Recv-Q: > > > > [~] netstat -tupln > > Active Internet connections (only servers) > > Proto Recv-Q Send-Q Local Address Foreign Address > > State PID/Program name > > tcp 129 0 192.168.2.1:8003 0.0.0.0:* > > LISTEN 1885/ceph-mgr > > > > And a log which may be related to the issue is captured: > > 2018-10-29 13:43:00.058319 7fcd1891b700 1 mgr[restful] Unknown > > request '140518797573648:0' > > > > After digging into the codes, should the requests list be protected by > > the requests_lock as the following patch? A possible condition we > > suspect is that a request is done and the restful plugin is notified. > > But unfortunately, the request is not appended to the requests list > > yet which makes a "Unknown request" log is generated and the > > submit_reqeust() function waits forever without acceping new request. > > > > diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py > > index 6ce610b..bbe88ab 100644 > > --- a/src/pybind/mgr/restful/module.py > > +++ b/src/pybind/mgr/restful/module.py > > @@ -363,9 +363,10 @@ class Module(MgrModule): > > if tag == 'seq': > > return > > > > - request = filter( > > - lambda x: x.is_running(tag), > > - self.requests) > > + with self.requests_lock: > > + request = filter( > > + lambda x: x.is_running(tag), > > + self.requests) > > > > if len(request) != 1: > > self.log.warn("Unknown request '%s'" % str(tag)) > > @@ -596,8 +597,8 @@ class Module(MgrModule): > > > > > > def submit_request(self, _request, **kwargs): > > - request = CommandsRequest(_request) > > with self.requests_lock: > > + request = CommandsRequest(_request) > > self.requests.append(request) > > if kwargs.get('wait', 0): > > while not request.is_finished(): > > > > > > Any idea and feedback are appreciated, thanks. > > > > Do you pass ?wait=1 to the URL so that submit_request actually waits > for the request to finish? Otherwise, the call should be non-blocking. > Yes. > In any case, I suspect you might be right. The request > (CommandsRequest) creation can fire up the notify event early and this > can cause a race condition where the actual request was not yet added > to the self.requests list so it won't be recognized in the notify > function. The patch looks good to me. Just note that the notify > function was modified slightly in current master so that it does not > print the 'Unknown request' warnings. > > Have you been able to successfully test your patch? Can you create a PR? We basically tested the patch already and keep monitoring the cluster status for several days to make sure whether the same issue occurs or not. We can create a PR for this issue. Thanks for the feedback :) - Jerry > > -boris > > > - Jerry