[RFC PATCH] msgr: Prevent unbounded use of bufferlist memory by throttling incoming messages.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Under some conditions it is possible for a daemon to accept new messages
faster than it can process them.  Since there is no limit on the number
of outstanding messages, if this condition persists the daemon will
consume all available memory/swap on the server.

Fix this by causing the message reader to monitor the bufferlist
memory use, and sleep for a bit if some threshold is exceeded.
Some more sophisticated technique that prevents more demanding clients
from starving less demanding clients may be needed.

Without this patch, for a filesystem with a single osd, I was able to
reliably cause the osd to be killed by the kernel oom-killer with a
large streaming write from a single client.  In my case the client
machine was several years newer than the server machine, and the
network was 10 Gb/s Ethernet with 9000B MTU, which configuration
probably contributes to the problem.

With this patch, the same test ran reliably, with the osd RSS staying
below a few hundred MiB.

Signed-off-by: Jim Schutt <jaschut@xxxxxxxxxx>
---
 src/msg/SimpleMessenger.cc |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/src/msg/SimpleMessenger.cc b/src/msg/SimpleMessenger.cc
index a7fb18d..11163e3 100644
--- a/src/msg/SimpleMessenger.cc
+++ b/src/msg/SimpleMessenger.cc
@@ -1750,6 +1750,20 @@ Message *SimpleMessenger::Pipe::read_message()
   unsigned data_off = le32_to_cpu(header.data_off);
   if (data_len) {
     int left = data_len;
+    int throttled = 0;
+
+    while (buffer_total_alloc.read() + data_len > (64*1024*1024)) {
+      struct timespec sleepy_time = {0, (1000 * 1000)}; // 1 msec
+      if (!throttled)
+	dout(4) << " pipe reader paused" << "; buffer_total_alloc "
+		<< buffer_total_alloc.read() << dendl;
+      ::nanosleep(&sleepy_time, 0);
+      throttled++;
+    }
+    if (throttled)
+      dout(4) << " pipe reader unpaused; buffer_total_alloc "
+	       << buffer_total_alloc.read() << dendl;
+      
     if (data_off & ~PAGE_MASK) {
       // head
       int head = MIN(PAGE_SIZE - (data_off & ~PAGE_MASK),
-- 
1.6.6.1


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux