On 08/17/2017 10:39 PM, Nivin Lawrence wrote:
...
__1)__Is all the code for AoE initiator/target located @
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/aoe
<http://%2522/>?____
No, that's the initiator for the Linux kernel, the most popular initiator.
The most popular open source target is the vblade. You can find it in
the OpenAoE project on github.
__2)__Is it possible to cleanly stop the AoE target, ensuring the cache
is written to disk and any cleanup needed is done?____
Yes, the steps would be:
* On the initiator safely shut down each layer using
the storage, from highest to lowest level.
* E.g., umount, vgchange -a n ...
* Run "sync" on the initiator
There are some kernels (old, probably) that do not
automatically sync on unmount.
* Interrupt the networking that supports the AoE command stream
This is just a paranoia step. No AoE commands should be going
out after you stop using the AoE target.
* Run sync on the host running the target.
* Kill vblade (assuming that's what it is) with SIGTERM.
* Run sync on the target host again, for paranoia.
Now if the target is exporting a file on some underlying storage that
needs to be quiesced or something, you'd have to quiesce it, but that
seems unlikely.
More comments follow quotes below.
__3)__I have a use case where I need to use AoE over network initially,
but later stop the remote target, restart target locally on loopback
interface. The storage that is accessed in both cases is the same, so no
issues with data being in sync. I have a few clarification requests in
this regard:____
That's confusing me. Can you be really concrete, like with hostnames,
network interface names, and network switch names and ports, etc.?
__a.__Wanted to understand if the AoE model has specific optimizations
done to make this as close as possible to a regular local disk access.____
The biggest optimization you can take advantage of is fast networking
with jumbo (e.g., 9000 MTU at all network endpoints) packets.
...
__c.__Do you think further optimizations are theoretically possible to
take care of this specific scenario to make it just like a regular local
disk access?____
If you search the archives you can find repeated but infrequent posts
about a performance problem that can occur when the alignment of I/O
requests on the initiator doesn't match the backing storage. Then you
wind up doing extra I/O. You could look into that problem. It would be
easier to use 40 GbE, though.
4)__Is there a way to pause the initiator from trying to communicate
with the target once it is decided that the target has to be shut
and restarted locally?
You can pause the initiator by interrupting networking, e.g., by setting
the AoE network interface(s) to a "down" state. The aoe initiator will
retransmit commands until it gets a response. If aoe_deadsecs seconds
elapse, though, the aoe driver will mark the device as down and fail all
outstanding I/O requests, so you can set that to a large enough number
of seconds to support the interruption.
...
__5)__For use of AoE in production, I was wondering if there would be
any kind of metric which would help with details on the stability and
availability expectations and any feedback from current usage in
production.____
The code itself is stable and mature. Storage over networking has its
pitfalls, whether it's enterprise stuff or something you put together
with open source components. E.g., if your network switch has buggy
firmware, you could have problems. The best way to gain confidence is
to use it in systems where you can make mistakes and learn a use pattern
that works for you.
--
Ed
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss