On Tue, 2013-02-26 at 15:25 +0000, Benjamin ESTRABAUD wrote: > Hi Nicholas, all, > > Here is a recap of our issue: > > * Running iSCSI read IOs over LIO with IO Meter (version 2006.07.27) on > Windows and a queue depth (IOs/target) of 10 or above causes the memory > usage to grow nearly as big and as fast as the read IOs received, with a > degradation of the IO performance proportional to the ammount of extra > memory used (especially visible when using a fast 10GbE link), and > whereby the extra memory used rarely goes over 1 to 3GB, after which it > suddenly goes back up to its original level, at which point the cycle > restarts. > > This is probably not very clear, so here's a bit more details: > > Free memory 3.8GB. > Running read IOs over a GbE link at 100MB/sec: free memory decreases by > ~100MB per second. > seconds later, free memory reaches 2.8GB > next second, the free memory has gone back to 3.8GB (recovered). > Restart above cycle continuously > > Some more detailed informations about the conditions to reproduce the issue: > > * The issue only happens on 3.5+ kernels. Works fine on 3.4 kernels. Benjamin, first thank you for the detailed bug report. So your actually hitting a TX thread regression bug in iscsi-target code causing the per connection immediate queue to be starved of CPU, and thus preventing StatSN acknowledged iscsi_cmd descriptors within the per connection immediate queue to release memory back into lio_qr_cache + lio_cmd_cache slabs. As you're observed, this is happening when a heavy per connection read workload causes the response queue to run for long periods of time without checking immediate queue status. So the v3.4 code checks for a break from execution after each MaxBurstLength sized DataIN Sequence is sent, and this logic managed to get broken in the following upstream commit: commit 6f3c0e69a9c20441bdc6d3b2d18b83b244384ec6 Author: Andy Grover <agrover@xxxxxxxxxx> Date: Tue Apr 3 15:51:09 2012 -0700 target/iscsi: Refactor target_tx_thread immediate+response queue loops (grover CC'ed) The patch below fixes the regression by following pre v3.5 logic to check the TX immediate queue bit + break after each DataIN sequence has been sent. This fixes the bug on my side with v3.8-rc7 code, and is getting pushed to target-pending/for-next now. Please verify on your end. Thank you, --nab diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 23a98e6..af77396 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -3583,6 +3583,10 @@ check_rsp_state: spin_lock_bh(&cmd->istate_lock); cmd->i_state = ISTATE_SENT_STATUS; spin_unlock_bh(&cmd->istate_lock); + + if (atomic_read(&conn->check_immediate_queue)) + return 1; + continue; } else if (ret == 2) { /* Still must send status, @@ -3672,7 +3676,7 @@ check_rsp_state: } if (atomic_read(&conn->check_immediate_queue)) - break; + return 1; } return 0; @@ -3716,12 +3720,15 @@ restart: signal_pending(current)) goto transport_err; +get_immediate: ret = handle_immediate_queue(conn); if (ret < 0) goto transport_err; ret = handle_response_queue(conn); - if (ret == -EAGAIN) + if (ret == 1) + goto get_immediate; + else if (ret == -EAGAIN) goto restart; else if (ret < 0) goto transport_err; -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html