Re: blkio cgroup [solved]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi List

Vivek and I talked yesterday via IRC and here's what we found.

First of all, I need to define the "freeze"- or "hang"-situation in
order to avoid misunderstandings. When the "hang" state occures, this
means it is impossible to do any io on the system. vmstat shows 250-300
blocked threads. Therefore, it is not possible to open new ssh
connections or log into the servers console. Established ssh connections
however, keep working. It is possible to run commands. Just don't touch
any files. That immediately hangs the connections.

Okay, now that we got that cleared up ...

During the "hang" situation, it is possible to change from cfq to
deadline scheduler for /dev/sdb (the pv for the lvs). This makes the io
happen and the system is responsive as usual.

After applying the patch correctly (my mistake), we could see these
debug messages from debugfs:

qemu-kvm-3955  [004]   823.360594:   8,16   Q  WS 3408164104 + 480
[qemu-kvm]
qemu-kvm-3955  [004]   823.360594:   8,16   S  WS 3408164104 + 480
[qemu-kvm]
<...>-3252  [016]   823.361146:   8,16   m   N cfq idle timer fired
<...>-3252  [016]   823.361151:   8,16   m   N cfq3683 slice expired t=0
<...>-3252  [016]   823.361154:   8,16   m   N cfq3683 sl_used=2 disp=25
charge=2 iops=0 sect=9944
<...>-3252  [016]   823.361154:   8,16   m   N cfq3683 del_from_rr
<...>-3252  [016]   823.361161:   8,16   m   N cfq schedule dispatch:
busy_queues=38 rq_queued=132 rq_in_driver=0

quote Vivek Goyal
cfq has 38 busy queues and 132 requests queued and it tries to schedule
a dispatch
and that dispatch never happens for some reason and cfq is hung
/quote

So the next idea was to try with 2.6.38-rc6, just in case there is a bug
in workqueue logic which got fixed?

And it turns out: With 2.6.38-rc6, the problem does not happen.

I will see whether I can bisect the kernel patches and see which one was
the good one. I have to figure out a way to do that, but if I do it and
find out, I will keep you posted.

Vivek then asked me to use another 2.6.37 debug patch and re-run the
test. Attached are the logs from that run.

Regards
Dominik

On 02/23/2011 02:37 PM, Dominik Klein wrote:
> Hi
> 
> so I ran these tests again. No patch applied yet. And - at least once -
> it worked. I did everything exactly the same way as before. Since the
> logs are 8 MB, even when best-bzip2'd, and I don't want everybody to
> have to download these, I uploaded them to an external hoster:
> http://www.megaupload.com/?d=SWKTC0V4
> 
> Traces were created with
> blktrace -n 64 -b 16384 -d /dev/sdb -o - | blkparse -i -
> blktrace -n 64 -b 16384 -d /dev/vdisks/kernel3 -o - | blkparse -i -
> 
>> Can you please apply attached patch. 
> 
> Unfortunately not. Cannot be applied to 2.6.37. I guess your source is
> newer and I fail to find the places in the file to patch manually.
> 
>> This just makes CFQ output little
>> more verbose and run the test again and capture the trace.
>>
>> - Start the trace on /dev/sdb
>> - Start the dd jobs in virt machines
>> - Wait for system to hang
>> - Press CTRL-C
>> - Make sure there were no lost events otherwise increase the size and
>>   number of buffers.
> 
> Tried that. Unfortunately, even with max buffer size of 16 M [1], this
> leaves some Skips. I also tried to increase the number of buffers over
> 64, but that produced Oops'es.
> 
> However, I attached kernel3's blktrace of a case where the error
> occured. Maybe you can read something from that.
> 
>> Can you also open tracing in another window and also trace one of the
>> throttled dm deivces, say /dev/disks/kernel3. Following the same procedure
>> as above. So let the two traces run in parallel.
> 
> So what next?
> 
> Regards
> Dominik
> 
> [1]
> http://git.kernel.org/?p=linux/kernel/git/axboe/blktrace.git;a=blob_plain;f=blktrace.c;hb=HEAD
> and look for "Invalid buffer"
vmstat 2 
 0  0      0 130316736  16364  74840    0    0     0     0 16891 33021  0  1 99  0
 8  0      0 129400944  16364  74840    0    0     0    14 17975 33233  4  3 93  0
 0 306      0 127762016  16364  74840    0    0     0 144642 20478 35476 11  6 28 56
 1 306      0 127761176  16364  74840    0    0     0     0 17307 33100  3  4 25 69
 2 306      0 127760904  16364  74840    0    0     0     0 17300 32947  3  4 24 69
 0 306      0 127760768  16364  74840    0    0     0     0 17189 32953  2  3 25 70
 1 306      0 127760752  16364  74840    0    0     0     0 17228 32966  2  3 21 74
 0 306      0 127760752  16364  74840    0    0     0     0 17138 32938  1  4 24 72
 0 306      0 127760736  16364  74840    0    0     0     0 17159 32954  1  4 22 73
 1 306      0 127760736  16364  74840    0    0     0     0 17210 33011  1  4 23 73
 0 306      0 127760736  16364  74840    0    0     0     0 17167 32980  0  4 22 74
 0 306      0 127760736  16364  74840    0    0     0     0 17137 32930  0  4 21 75
 1 306      0 127760352  16364  74840    0    0     0     0 17165 32933  0  2 23 75
 1 306      0 127760472  16364  74840    0    0     0     0 17079 32928  0  0 25 74
 0 306      0 127760560  16364  74840    0    0     0     0 17188 32950  1  2 22 75
 2 306      0 127760560  16364  74840    0    0     0     6 17142 32838  1  4 21 74
 0 306      0 127760560  16364  74840    0    0     0    16 17213 32896  1  4 21 74
 0 306      0 127760560  16364  74840    0    0     0     0 17198 32899  2  4 21 74
 0 306      0 127760560  16364  74840    0    0     0    12 17218 32910  1  4 21 74
 1 306      0 127760544  16364  74840    0    0     0     0 17203 32884  2  3 21 74
 1 306      0 127760552  16364  74840    0    0     0     0 17199 32904  2  4 21 74
 0 306      0 127760560  16364  74840    0    0     0     0 17230 32871  2  4 21 74
 0 306      0 127760560  16364  74840    0    0     0     0 17177 32860  2  4 21 74
 0 306      0 127760640  16364  74840    0    0     0     0 17185 32888  1  4 21 73
 0 306      0 127760640  16364  74840    0    0     0     0 17207 32851  2  4 25 70
 1 306      0 127760640  16364  74840    0    0     0     0 17023 32914  0  4 25 70
 block/blk-core.c    |    1 +
 block/blk.h         |    8 +++++++-
 block/cfq-iosched.c |   11 ++++++++---
 3 files changed, 16 insertions(+), 4 deletions(-)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c	2011-02-22 13:23:25.000000000 -0500
+++ linux-2.6/block/cfq-iosched.c	2011-02-23 11:54:49.488446104 -0500
@@ -498,7 +498,7 @@ static inline bool cfq_bio_sync(struct b
 static inline void cfq_schedule_dispatch(struct cfq_data *cfqd)
 {
 	if (cfqd->busy_queues) {
-		cfq_log(cfqd, "schedule dispatch");
+		cfq_log(cfqd, "schedule dispatch: busy_queues=%d rq_queued=%d rq_in_driver=%d q=%p", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver, cfqd->queue);
 		kblockd_schedule_work(cfqd->queue, &cfqd->unplug_work);
 	}
 }
@@ -2229,6 +2229,8 @@ static struct cfq_queue *cfq_select_queu
 {
 	struct cfq_queue *cfqq, *new_cfqq = NULL;
 
+	cfq_log(cfqd, "select_queue: busy_queues=%d rq_queued=%d rq_in_driver=%d", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
+
 	cfqq = cfqd->active_queue;
 	if (!cfqq)
 		goto new_queue;
@@ -2499,8 +2501,10 @@ static int cfq_dispatch_requests(struct 
 		return cfq_forced_dispatch(cfqd);
 
 	cfqq = cfq_select_queue(cfqd);
-	if (!cfqq)
+	if (!cfqq) {
+		cfq_log(cfqd, "select: no cfqq selected");
 		return 0;
+	}
 
 	/*
 	 * Dispatch a request from this cfqq, if it is allowed
@@ -3359,7 +3363,7 @@ static void cfq_insert_request(struct re
 	struct cfq_data *cfqd = q->elevator->elevator_data;
 	struct cfq_queue *cfqq = RQ_CFQQ(rq);
 
-	cfq_log_cfqq(cfqd, cfqq, "insert_request");
+	cfq_log_cfqq(cfqd, cfqq, "insert_request: busy_queues=%d rq_queued=%d rq_in_driver=%d", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
 	cfq_init_prio_data(cfqq, RQ_CIC(rq)->ioc);
 
 	rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);
@@ -3707,6 +3711,7 @@ static void cfq_kick_queue(struct work_s
 	struct request_queue *q = cfqd->queue;
 
 	spin_lock_irq(q->queue_lock);
+	cfq_log(cfqd, "cfq_kick_queue called. busy_queues=%d rq_queued=%d rq_in_driver=%d", cfqd->busy_queues, cfqd->rq_queued, cfqd->rq_in_driver);
 	__blk_run_queue(cfqd->queue);
 	spin_unlock_irq(q->queue_lock);
 }
Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c	2011-02-22 13:23:25.000000000 -0500
+++ linux-2.6/block/blk-core.c	2011-02-23 11:50:34.406700216 -0500
@@ -413,6 +413,7 @@ void __blk_run_queue(struct request_queu
 		queue_flag_clear(QUEUE_FLAG_REENTER, q);
 	} else {
 		queue_flag_set(QUEUE_FLAG_PLUGGED, q);
+		trace_printk("not recursing. Scheduling another dispatch\n");
 		kblockd_schedule_work(q, &q->unplug_work);
 	}
 }
Index: linux-2.6/block/blk.h
===================================================================
--- linux-2.6.orig/block/blk.h	2011-02-18 13:39:47.000000000 -0500
+++ linux-2.6/block/blk.h	2011-02-23 11:53:37.844304978 -0500
@@ -57,6 +57,8 @@ static inline struct request *__elv_next
 {
 	struct request *rq;
 
+	trace_printk("elv_next_request called. q=%p\n", q);
+
 	while (1) {
 		while (!list_empty(&q->queue_head)) {
 			rq = list_entry_rq(q->queue_head.next);
@@ -68,8 +70,12 @@ static inline struct request *__elv_next
 				return rq;
 		}
 
-		if (!q->elevator->ops->elevator_dispatch_fn(q, 0))
+		trace_printk("No requests in q->queue_head. Will call elevator. q=%p\n", q);
+
+		if (!q->elevator->ops->elevator_dispatch_fn(q, 0)) {
+			trace_printk("No requests in elevator either q=%p\n", q);
 			return NULL;
+		}
 	}
 }

Attachment: trace_pipe.gz
Description: application/gzip

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]