Re: vblade-22-rc1 is first release candidate for version 22

Catalin Salgau <csalgau@xxxxxxxxxxxxxxxxxxxxx> · Sun, 15 Jun 2014 18:30:09 +0300

Hi again!
I like my long emails, don't I?

On 15/06/2014 4:22 PM, Killer{R} wrote:
> Hello Catalin,
>
> Sunday, June 15, 2014, 4:08:15 PM, you wrote:
>
>>>>>> I would like to request two changes before release.
>>>>>> - An option to restrict the size of packets over automatic detection of
>>>>>> MTU.
>>>>> You mean like if the MTU is 9000, you want the ability to tell the
>>>>> vblade to act like it's smaller, right?
>>> CS> Yes. That's the gist of it.
>>> CS> I believe there is some value in the ability to manually tweak the
>>> CS> maximum packet size used by vlade.
>>> But its all to initiator side. Actually for example WinAoE (and its
>>> forks ;) ) does MTU 'autodetection' instead of using Conf::scnt.
>>>
> CS> That's not entirely correct.
> CS> WinAoE indeed does a form of negotiation there - it will start at
> CS> (MTU/sector size) and will do reads of decreasing size, until it
> CS> receives a valid packet.
> CS> However! If you would kindly check ata.c:157 (on v22-rc1) any ATA
> CS> request for more than the supported packet size will be refused.
>
> That's also not entirely correct :) It increases sectors count from
> 1 to ether MTU limit, either any kind of error from target, including
> timeout.
You're probably right there. I haven't looked at it recently. In any 
event, the observation stands.
Changing the supported MTU in vblade will limit packets to that size (I 
wouldn't have bothered with the FreeBSD MTU detection code if that 
wasn't the case)
> However in my investigation I found that its usefull for initiator to
> know also value called in vblade as 'buffers count' .. I mean such
> a count of packets initiator can send to target knowing that it will
> likely process them all. Because sending more request than this value
> as 'outstanding' sharply increases drops (and resends) rate.
> I implemented also kind of negotiation to detect this by sending
> 'congestion' extension command that does usleep(500000) and the
> responds for all commands received in buffer. Such approach by
> comparing with directly asking target for buffers count will
> detect also any implicit buffering between initiator and target
>
As per the AoE spec, messages in excess of Buffer Count are dropped.
Since vblade processes these synchronously, this happens at the network 
buffer level. If using async I/O, you're responsible for that, in theory.
As far as I remember, WinAoE not only doesn't care about that, but 
doesn't even request this information from the target.
Should WinAoE limit the number of floating packets, as the target says 
it should, we wouldn't actually be talking about that, but that would 
probably cause more latency, since the initiator would have to wait for 
confirmation for at least one packet before sending another one in 
excess of bufcnt (and as I remember, WinAoE does not apply limits to 
sending packets)
This would probably reduce throughput and increase average response time 
under even moderate load, but decrease drop-rate.
I'm not actually sure that the drop/resend rate is something to aim for. 
It's clearly desirable to minimise these, but not for the sake of the 
number.

Regarding your proposed extension, I could see something like this being 
valuable in the event that the target can detect increased drop-rate and 
inform the initiator to ease-off or resend packets faster than the 
default timeout, but I since the target is not allowed to send 
unsolicited packets to the initiator, a specific request would be 
needed(say when a large number of packets are outstanding), but this 
would raise the question - if those packets are being dropped, what is 
there to stop the target's network stack from dropping our congestion 
detection packet?
On that note, vblade could be thought to broadcast load status 
periodically or on high-drop rate, and in initiators would notice that 
and adapt, but I believe that this raises some security concerns and 
also would slightly slow the target since it would need to yield to the 
kernel for the drop-rate information every few requests.

@Ed
Now, thanks to Killer's reference to the Buffer Count, I remember that 
the freebsd code does not actually use it to allocate the network buffers.
Under Linux, following setsockopt with the default bufcnt, the receive 
buffer would end up 24000 bytes long for an MTU of 1500 bytes, and 
144000 for a 9K MTU.
Under FreeBSD the code defaults to a 64K buffer. That makes bufcnt 43 on 
an 1500 byte MTU, but 7 on a 9K MTU.
This could cause an increase in dropped packets and explain the decrease 
in throughput I mentioned in a previous mail. I did not check for this 
when testing.
I was not concerned because multiple instances of vblade on the same 
interface would saturate the channel anyway, but now I'm starting to worry:)

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss