Hi, We setup a ceph cluster (v12.2.2) with restful api plugin running, but sometimes requests got blocked forever without responding. While stucking in such condition, we checked the netstat output and it shown some packets were queued in the Recv-Q: [~] netstat -tupln Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 129 0 192.168.2.1:8003 0.0.0.0:* LISTEN 1885/ceph-mgr And a log which may be related to the issue is captured: 2018-10-29 13:43:00.058319 7fcd1891b700 1 mgr[restful] Unknown request '140518797573648:0' After digging into the codes, should the requests list be protected by the requests_lock as the following patch? A possible condition we suspect is that a request is done and the restful plugin is notified. But unfortunately, the request is not appended to the requests list yet which makes a "Unknown request" log is generated and the submit_reqeust() function waits forever without acceping new request. diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py index 6ce610b..bbe88ab 100644 --- a/src/pybind/mgr/restful/module.py +++ b/src/pybind/mgr/restful/module.py @@ -363,9 +363,10 @@ class Module(MgrModule): if tag == 'seq': return - request = filter( - lambda x: x.is_running(tag), - self.requests) + with self.requests_lock: + request = filter( + lambda x: x.is_running(tag), + self.requests) if len(request) != 1: self.log.warn("Unknown request '%s'" % str(tag)) @@ -596,8 +597,8 @@ class Module(MgrModule): def submit_request(self, _request, **kwargs): - request = CommandsRequest(_request) with self.requests_lock: + request = CommandsRequest(_request) self.requests.append(request) if kwargs.get('wait', 0): while not request.is_finished(): Any idea and feedback are appreciated, thanks. - Jerry