On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote: > > Hi, > Hi, > We setup a ceph cluster (v12.2.2) with restful api plugin running, but > sometimes requests got blocked forever without responding. While > stucking in such condition, we checked the netstat output and it shown > some packets were queued in the Recv-Q: > > [~] netstat -tupln > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > tcp 129 0 192.168.2.1:8003 0.0.0.0:* > LISTEN 1885/ceph-mgr > > And a log which may be related to the issue is captured: > 2018-10-29 13:43:00.058319 7fcd1891b700 1 mgr[restful] Unknown > request '140518797573648:0' > > After digging into the codes, should the requests list be protected by > the requests_lock as the following patch? A possible condition we > suspect is that a request is done and the restful plugin is notified. > But unfortunately, the request is not appended to the requests list > yet which makes a "Unknown request" log is generated and the > submit_reqeust() function waits forever without acceping new request. > > diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py > index 6ce610b..bbe88ab 100644 > --- a/src/pybind/mgr/restful/module.py > +++ b/src/pybind/mgr/restful/module.py > @@ -363,9 +363,10 @@ class Module(MgrModule): > if tag == 'seq': > return > > - request = filter( > - lambda x: x.is_running(tag), > - self.requests) > + with self.requests_lock: > + request = filter( > + lambda x: x.is_running(tag), > + self.requests) > > if len(request) != 1: > self.log.warn("Unknown request '%s'" % str(tag)) > @@ -596,8 +597,8 @@ class Module(MgrModule): > > > def submit_request(self, _request, **kwargs): > - request = CommandsRequest(_request) > with self.requests_lock: > + request = CommandsRequest(_request) > self.requests.append(request) > if kwargs.get('wait', 0): > while not request.is_finished(): > > > Any idea and feedback are appreciated, thanks. > Do you pass ?wait=1 to the URL so that submit_request actually waits for the request to finish? Otherwise, the call should be non-blocking. In any case, I suspect you might be right. The request (CommandsRequest) creation can fire up the notify event early and this can cause a race condition where the actual request was not yet added to the self.requests list so it won't be recognized in the notify function. The patch looks good to me. Just note that the notify function was modified slightly in current master so that it does not print the 'Unknown request' warnings. Have you been able to successfully test your patch? Can you create a PR? -boris > - Jerry