Under some conditions it is possible for a daemon to accept new messages faster than it can process them. Since there is no limit on the number of outstanding messages, if this condition persists the daemon will consume all available memory/swap on the server. Fix this by causing the message reader to monitor the bufferlist memory use, and sleep for a bit if some threshold is exceeded. Some more sophisticated technique that prevents more demanding clients from starving less demanding clients may be needed. Without this patch, for a filesystem with a single osd, I was able to reliably cause the osd to be killed by the kernel oom-killer with a large streaming write from a single client. In my case the client machine was several years newer than the server machine, and the network was 10 Gb/s Ethernet with 9000B MTU, which configuration probably contributes to the problem. With this patch, the same test ran reliably, with the osd RSS staying below a few hundred MiB. Signed-off-by: Jim Schutt <jaschut@xxxxxxxxxx> --- src/msg/SimpleMessenger.cc | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc index a7fb18d..11163e3 100644 --- a/src/msg/SimpleMessenger.cc +++ b/src/msg/SimpleMessenger.cc @@ -1750,6 +1750,20 @@ Message *SimpleMessenger::Pipe::read_message() unsigned data_off = le32_to_cpu(header.data_off); if (data_len) { int left = data_len; + int throttled = 0; + + while (buffer_total_alloc.read() + data_len > (64*1024*1024)) { + struct timespec sleepy_time = {0, (1000 * 1000)}; // 1 msec + if (!throttled) + dout(4) << " pipe reader paused" << "; buffer_total_alloc " + << buffer_total_alloc.read() << dendl; + ::nanosleep(&sleepy_time, 0); + throttled++; + } + if (throttled) + dout(4) << " pipe reader unpaused; buffer_total_alloc " + << buffer_total_alloc.read() << dendl; + if (data_off & ~PAGE_MASK) { // head int head = MIN(PAGE_SIZE - (data_off & ~PAGE_MASK), -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html