The AoE initiator (the side using the storage) called "aoe" does
retransmit AoE write commands for aoe_deadsecs seconds. The virtual
memory subsystem does buffer writes to filesystems. The
aoe_deadsecs module parameter is configurable. An issue that is
possibly related to your problems is briefly described below. Often the problem is not too little buffering of writes but too much of it. For writes to a filesystem, the data is actually modified in RAM, then at some point later, the dirty data in RAM is flushed out to the persistent storage. If the system waits too long, it can cause things to get clogged up. In a nutshell, the virtual memory subsystem's defaults were created before 64-bit systems were common and before large amounts of RAM were common. You can use some VM settings to encourage dirty pages writes to be written out by the process generating the writes more quickly, so that performance is more consistent. some example settings in the EtherDrive HOWTO FAQ: http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19 Linux Weekly News article about this problem: http://lwn.net/Articles/572911/ On 1/15/14, 11:14 AM, James R. Leu
wrote:
We see a similar issue with vblade when it becomes CPU starved due to resource contention on our AOE server. It would be nice if in these situations the AOE client would queueue write blocks and resend unack'd writes. On Wed, Jan 15, 2014 at 04:52:36PM +0100, Lars Täuber wrote:Hi, I experience some problems with the latest ggaoed version and a fresh ubuntu 14.04 aoe client (from the daily snapshots). http://code.google.com/p/ggaoed/source/list The kernel version on the client side is 3.13.0-3-generic # modinfo aoe filename: /lib/modules/3.13.0-3-generic/kernel/drivers/block/aoe/aoe.ko version: 85 description: AoE block/char driver for 2.6.2 and newer 2.6 kernels author: Sam Hopkins <sah@xxxxxxxxxx> license: GPL srcversion: 5F0AC5D858A1164C5170585 The client is a testing box but the server is in productive state for years. So I can't change the server config. I did a tcpdump and see that the server stops sending a response to the last write request of a series of write requests. 9 seconds after the client waited for responses without receiving any paket from the target it issues a "Query Config Information Request" and marks the device as read only. This results in a read-only filesystem. The responses to the "Query Config Information Requests" can be seen right after the requests. I can "repair" this with an aoe-revalidate and remounting rw. But this appears to happen right with the next longer write operation. I'm stuck here. It seems the client doesn't resend unresponded requests. Is this on purpose? Thanks Lars ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/aoetools-discuss |
------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/aoetools-discuss