[no subject]

**Date** **Thread**

An alternative way to address the current problem at the rsockets level
would be w re-write of the rpoll(): instead of the busy loop at the
beginning followed by the reall poll() call with the full user
specificed timeout value ("ms tcp read timeout" in our case), I would
embed the real poll()  into a loop, splitting the user specified timeout
into smaller portions and doing the rsockets specific rs_poll_check()
on every timeout of the real poll().

Best Regards

Andreas Bluemle

On Tue, 13 Aug 2013 07:53:12 +0200
Andreas Bluemle <andreas.bluemle@xxxxxxxxxxx> wrote:

> Hi Matthew,
> 
> I can confirm the beahviour whichi you describe.
> I too believe that the problem is on the client side (ceph command).
> My log files show the very same symptom, i.e. the client side
> not being able to shutdown the pipes properly.
> 
> (Q: I had problems yesterday to send a mail to ceph-users list
> with the log files attached to it because of the size of 
> the attachments exceeding some limit; I hadnÃ?t been subscribed
> to the list at that point. Is the uses of pastebin.com the better
> way to provide such lengthy information in general?
> 
> 
> Best Regards
> 
> Andreas Bluemle
> 
> On Tue, 13 Aug 2013 11:59:36 +0800
> Matthew Anderson <manderson8787@xxxxxxxxx> wrote:
> 
> > Moving this conversation to ceph-devel where the dev's might be able
> > to shed some light on this.
> > 
> > I've added some additional debug to my code to narrow the issue down
> > a bit and the reader thread appears to be getting locked by
> > tcp_read_wait() because rpoll never returns an event when the socket
> > is shutdown. A hack way of proving this was to lower the timeout in
> > rpoll to 5 seconds. When command like 'ceph osd tree' completes you
> > can see it block for 5 seconds until rpoll times out and returns 0.
> > The reader thread is then able to join and the pipe can be reaped.
> > 
> > Ceph log is here - http://pastebin.com/rHK4vYLZ
> > Mon log is here - http://pastebin.com/WyAJEw0m
> > 
> > What's particularly weird is that the monitor receives a POLLHUP
> > event when the ceph command shuts down it's socket but the ceph
> > command never does. When using regular sockets both sides of the
> > connection receive a POLLIN | POLLHUP | POLRDHUP event when the
> > sockets are shut down. It would seem like there is a bug in rsockets
> > that causes the side that calls shutdown first not to receive the
> > correct rpoll events.
> > 
> > Can anyone comment on whether the above seems right?
> > 
> > Thanks all
> > -Matt
> > 
> > 
> > On Tue, Aug 13, 2013 at 12:06 AM, Andreas Bluemle <
> > andreas.bluemle@xxxxxxxxxxx> wrote:
> > 
> > > Hi Matthew,
> > >
> > > I am not quite sure about the POLLRDHUP.
> > > On the server side (ceph-mon), tcp_read_wait does see the
> > > POLLHUP - which should be the indicator that the
> > > the other side is shutting down.
> > >
> > > I have also taken a brief look at the client side (ceph mon stat).
> > > It initiates a shutdown - but never finishes. See attached log
> > > file from "ceph --log-file ceph-mon-stat.rsockets --debug-ms 30
> > > mon stat". I have also attached the corresponding log file for
> > > regualr TCP/IP sockets.
> > >
> > > It looks to me that in the rsockets case, the reaper is able to
> > > cleanup even though there is still sth. left to do - and hence the
> > > shutdown never completes.
> > >
> > >
> > > Best Regards
> > >
> > > Andreas Bluemle
> > >
> > >
> > > On Mon, 12 Aug 2013 15:11:47 +0800
> > > Matthew Anderson <manderson8787@xxxxxxxxx> wrote:
> > >
> > > > Hi Andreas,
> > > >
> > > > I think we're both working on the same thing, I've just changed
> > > > the function calls over to rsockets in the source instead of
> > > > using the pre-load library. It explains why we're having the
> > > > exact same problem!
> > > >
> > > > From what I've been able to tell the entire problem revolves
> > > > around rsockets not supporting POLLRDHUP. As far as I can tell
> > > > the pipe will only be removed when tcp_read_wait returns -1.
> > > > With rsockets it never receives the POLLRDHUP event after
> > > > shutdown_socket() is called so the rpoll call blocks until
> > > > timeout (900 seconds) and the pipe stays active.
> > > >
> > > > The question then would be how can we destroy a pipe without
> > > > relying on POLLRDHUP? shutdown_socket() always gets called when
> > > > the socket should be closed so could there might be a way to
> > > > trick tcp_read_wait() into returning -1 by doing somethere in
> > > > shutdown_socket() but I'm not sure how to go about it.
> > > >
> > > > Any ideas?
> > > >
> > >
> 
> 
> 

-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@xxxxxxxxxxx
Heinrich Boell Strasse 88           Phone: (+49) 89 4317582
D-81829 Muenchen (Germany)          Mobil: (+49) 177 522 0151
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html