Hi, Nivin Lawrence. Answers appear between selected quotes below. On 03/27/2017 11:12 AM, Nivin Lawrence wrote: > Hi experts, > I am trying at understanding the implications of any network drops > and AOE/vblade server restart. > > 1) Is it expected to not see any loss in the Ethernet network where the > client/server resides or is the protocol capable of handling drops in > the network? For storage, the "target" is the place where the data is persisted, and the "initiator" is the software that wants to read and write the data. I think you mean initiator by client, but I'll keep using the storage oriented words, because it matches the documentation. No, Ethernet is expected to drop packets, and the initiator is expected to retransmit commands. The tag field in the AoE header identifies a command, and the initiator gets to pick it. The target copies that tag into the response to the command, so that the initiator can match the response to the waiting request. > 2) If the server restarts and remains unavailable for say 5 seconds, > what happens to the clients? That's up to the initiator. For example, the aoe driver in Linux has a module parameter, aoe_deadsecs, that controls this behavior. If it's set to 4, then it will fail all the AoE commands (reads and writes) that have been waiting for a response for 4 seconds. But the default is higher: three minutes worth of seconds. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/aoe/aoecmd.c#n29 ... > - Would be good to understand if there is an option where client > wont need to do a mount again once the server becomes available and if > the transactions could continue just as if there was a network > connectivity issue for 5 seconds. From the initiator's perspective, that's already the case: Some people plug the initiator and target into a new networks switch without even unmounting. That's not recommended, but it has happened a lot. Now there's also the issue of the target. Nothing I've said addresses the possibility of write caching. Without write caching there is no problem with the target power cycling or something, you mentioned the server restarting. If the server is running a vblade using regular I/O, you can't just restart that system. You'd have to make sure that incoming AoE commands stop, all write operations finish, buffers are synced from RAM to disk, and *then* the server restarts. When AoE resumes, some of the commands can be performed again, the ones that were performed without the response reaching the initiator. Usually that will mean the same data gets written to the same place as before. There are ways to avoid write caching on the storage target, but they often entail a significant performance hit. There are ways to make the whole write cache path keep the data safe in case of power interruption, but the ones I'm aware of use special hardware. -- Ed ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/aoetools-discuss