On Tue, Feb 16, 2016 at 11:57 AM, Varada Kari <Varada.Kari@xxxxxxxxxxx> wrote: > Hi all, > > Apologies for the long mail. > > We are testing the latest jewel branch with kernel RBD in our test > clusters. When running some benchmark tests with various tools like > vdbench and medusa, we are facing io's getting timed out and some > messenger issues when osd's in a particular node are not reachable(crashed). > > we simulated the same problem by killing the osds in a specific node. > Here is some information about the use case what we are following. > please note, steps listed are used for reproducing the problem. > > [...] > > Even retires seemed to be replied back to the client, but client is > handling the request or not received by the libceph? > Enabled logs on libceph end find any handle_reply messages, but they are > present in the kernel logs for this tid, except some keep alives been > sent to the osd. Have tried with latest stable kernel(4.4.1) to check if > any recent fixes resolves the issues, but problem still persists. I assume you mean aren't present? If you have a matching set of osd and kernel client logs, please compress and upload them somewhere so I can take a look. Could you try 4.5-rc4? There is a related fix in 4.5-rc which isn't in 4.4.1 (although it's been scheduled for 4.4-stable). 4.5-rc4 would also be easier to debug. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html