Re: [ceph-users] Help needed porting Ceph to RSockets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Moving this conversation to ceph-devel where the dev's might be able
to shed some light on this.

I've added some additional debug to my code to narrow the issue down a
bit and the reader thread appears to be getting locked by
tcp_read_wait() because rpoll never returns an event when the socket
is shutdown. A hack way of proving this was to lower the timeout in
rpoll to 5 seconds. When command like 'ceph osd tree' completes you
can see it block for 5 seconds until rpoll times out and returns 0.
The reader thread is then able to join and the pipe can be reaped.

Ceph log is here - http://pastebin.com/rHK4vYLZ
Mon log is here - http://pastebin.com/WyAJEw0m

What's particularly weird is that the monitor receives a POLLHUP event
when the ceph command shuts down it's socket but the ceph command
never does. When using regular sockets both sides of the connection
receive a POLLIN | POLLHUP | POLRDHUP event when the sockets are shut
down. It would seem like there is a bug in rsockets that causes the
side that calls shutdown first not to receive the correct rpoll
events.

Can anyone comment on whether the above seems right?

Thanks all
-Matt


> On Tue, Aug 13, 2013 at 12:06 AM, Andreas Bluemle
> <andreas.bluemle@xxxxxxxxxxx> wrote:
>>
>> Hi Matthew,
>>
>> I am not quite sure about the POLLRDHUP.
>> On the server side (ceph-mon), tcp_read_wait does see the
>> POLLHUP - which should be the indicator that the
>> the other side is shutting down.
>>
>> I have also taken a brief look at the client side (ceph mon stat).
>> It initiates a shutdown - but never finishes. See attached log file
>> from "ceph --log-file ceph-mon-stat.rsockets --debug-ms 30 mon stat".
>> I have also attached the corresponding log file for regualr TCP/IP
>> sockets.
>>
>> It looks to me that in the rsockets case, the reaper is able to cleanup
>> even though there is still sth. left to do - and hence the shutdown
>> never completes.
>>
>>
>> Best Regards
>>
>> Andreas Bluemle
>>
>>
>> On Mon, 12 Aug 2013 15:11:47 +0800
>> Matthew Anderson <manderson8787@xxxxxxxxx> wrote:
>>
>> > Hi Andreas,
>> >
>> > I think we're both working on the same thing, I've just changed the
>> > function calls over to rsockets in the source instead of using the
>> > pre-load library. It explains why we're having the exact same problem!
>> >
>> > From what I've been able to tell the entire problem revolves around
>> > rsockets not supporting POLLRDHUP. As far as I can tell the pipe will
>> > only be removed when tcp_read_wait returns -1. With rsockets it never
>> > receives the POLLRDHUP event after shutdown_socket() is called so the
>> > rpoll call blocks until timeout (900 seconds) and the pipe stays
>> > active.
>> >
>> > The question then would be how can we destroy a pipe without relying
>> > on POLLRDHUP? shutdown_socket() always gets called when the socket
>> > should be closed so could there might be a way to trick
>> > tcp_read_wait() into returning -1 by doing somethere in
>> > shutdown_socket() but I'm not sure how to go about it.
>> >
>> > Any ideas?
>> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux