Re: Query on AoE initiator accessing target started locally over lo

Ed Cashin <ed.cashin@xxxxxxx> · Sun, 27 Aug 2017 21:10:05 -0400

Hi.  I have some comments between quotes below.

On 08/24/2017 01:18 AM, Nivin Lawrence wrote:
...
The system can be considered as one with dual CPU complex, but a single 
SSD. There are 2 operating systems, OS-1 and OS-2 running on each of the 
CPU complex, OS-1 boots first and takes control of the SSD and starts 
the target. OS-2 which boots up later uses initiator to connect to 
target over an interface connecting the 2 CPU complex. There is no 
switch in between, its a back to back Ethernet network over which AoE is 
being used by OS-2 for all disk accesses.

Now, at some point in time, OS-1 has to be shut down, which means target 
is not available anymore. OS-2 will now start the target on loopback 
interface using "vbladed 0 0  lo /dev/sda".

In general, when you have two different hosts using a shared read-write 
resource like this, you have to prevent a split-brain situation by using 
fencing.  I think that if you're sure that OS-1 only ever uses the SSD 
at the request of OS-2 and you have fully shut down the whole path from 
the user on OS-2 through the storage layers on OS-1 before OS-2 begins 
using the SSD directly, it's OK, but you'd have to use two-phase commit 
or something to be sure it's always going to be 100% safe.  (Otherwise 
you run the risk of one of them booting during a communication failure 
and using the SSD inappropriately while the other is using it.)

More below.

As this system is going to 
run for a long time, i want to understand the overhead that is there in 
disk access from the POV of applications running on OS-2. I want to 
compare the AoE based access mechanism in the above scenario with the 
case where OS-2 applications are directly accessing the SSD.

Now, all the queries below are for this case where OS-2 has the 
initiator which tries to connect to the target which is also running in 
OS-2, no network in between. Not sure if your previous responses would 
still be valid for this specific case as for example MTU might not be a 
consideration from the actual data transfer POV.

    a.__Wanted to understand if the AoE model has specific optimizations
    done to make this as close as possible to a regular local disk access.

There are a lot of decisions that were made with performance in mind, 
but the loopback-only network scenario was not ever the primary concern 
as a deployment environment.  You should try it and see, though. 
Performance is always best measured and analyzed.  Sometimes there will 
be one or two bottlenecks that are easy to identify and eliminate.

The Linux kernel already has optimizations for loopback networking.

I think you might do more optimizations for your situation if you think 
carefully about I/O operations, their sizes and offsets.  You'll find 
some discussions of that periodically here in the list archives.

--
  Ed

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss