Re: ceph-mgr: requests to restful api get blocked sometimes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 31, 2018 at 9:11 AM Jerry Lee <leisurelysw24@xxxxxxxxx> wrote:
>
> Hi,
>

Hi,

> We setup a ceph cluster (v12.2.2) with restful api plugin running, but
> sometimes requests got blocked forever without responding.  While
> stucking in such condition, we checked the netstat output and it shown
> some packets were queued in the Recv-Q:
>
> [~] netstat -tupln
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
> tcp      129      0 192.168.2.1:8003        0.0.0.0:*
> LISTEN      1885/ceph-mgr
>
> And a log which may be related to the issue is captured:
> 2018-10-29 13:43:00.058319 7fcd1891b700  1 mgr[restful] Unknown
> request '140518797573648:0'
>
> After digging into the codes, should the requests list be protected by
> the requests_lock as the following patch?  A possible condition we
> suspect is that a request is done and the restful plugin is notified.
> But unfortunately, the request is not appended to the requests list
> yet which makes a "Unknown request" log is generated and the
> submit_reqeust() function waits forever without acceping new request.
>
> diff --git a/src/pybind/mgr/restful/module.py b/src/pybind/mgr/restful/module.py
> index 6ce610b..bbe88ab 100644
> --- a/src/pybind/mgr/restful/module.py
> +++ b/src/pybind/mgr/restful/module.py
> @@ -363,9 +363,10 @@ class Module(MgrModule):
>              if tag == 'seq':
>                  return
>
> -            request = filter(
> -                lambda x: x.is_running(tag),
> -                self.requests)
> +            with self.requests_lock:
> +                request = filter(
> +                        lambda x: x.is_running(tag),
> +                        self.requests)
>
>              if len(request) != 1:
>                  self.log.warn("Unknown request '%s'" % str(tag))
> @@ -596,8 +597,8 @@ class Module(MgrModule):
>
>
>      def submit_request(self, _request, **kwargs):
> -        request = CommandsRequest(_request)
>          with self.requests_lock:
> +            request = CommandsRequest(_request)
>              self.requests.append(request)
>          if kwargs.get('wait', 0):
>              while not request.is_finished():
>
>
> Any idea and feedback are appreciated, thanks.
>

Do you pass ?wait=1 to the URL so that submit_request actually waits
for the request to finish? Otherwise, the call should be non-blocking.

In any case, I suspect you might be right. The request
(CommandsRequest) creation can fire up the notify event early and this
can cause a race condition where the actual request was not yet added
to the self.requests list so it won't be recognized in the notify
function. The patch looks good to me. Just note that the notify
function was modified slightly in current master so that it does not
print the 'Unknown request' warnings.

Have you been able to successfully test your patch? Can you create a PR?

-boris

> - Jerry



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux