Re: vblade-22-rc1 is first release candidate for version 22

Killer{R} <support@xxxxxxxxxxxx> · Mon, 14 Jul 2014 22:29:45 +0300

Hello David,

Monday, July 14, 2014, 1:23:56 AM, you wrote:

IMHO problem caused mostly not only by page size, but also by HDD's
sector size. Nowadays HDDs has 4K physical sector size. However they
support accessing by 512 bytes, but this is ineffective, cause every
unaligned read access that doesnt fit into 4K sector will resulted
into 4K read, and every unaligned write - will cause disk to read
sector's data, modify it internally in buffers and the write it back.
Sure, firmware tries to do this in fastest way, but my tests shows
that there'is about 20..30% sequential write speed degradation (with
O_DIRECT) on writing 4K blocks if begining of each block is not
aligned to 4K too. So simple using jumbo frames is not enough to make
hardware work as fast as it can.
AoE protocol doesn't support 4K sectors directly, cause it should
support 'normal' MTU, but not only jumbo frames. However its
theoretically possible to make initiator report OS that its '4K
sector drive' and proper ('4K sector aware' :) ) OS will then access
it by 4K-aligned portions, that together with some buffering at
target's side should make it all work faster :). But its all looks
like a tricky workaround.

DL> So I do find it interesting to have a configuration to limit the size of
DL> the read/write request but it seems like it would be useful to understand
DL> the side affects on why someone would want to do this. Catalin suggested
DL> that reducing the size of the jumbo frames decreases latency and improves
DL> boot-times and said that the system "feels more response". This is were I
DL> have a problem though because something "feeling" more responsive is not
DL> very satisfying. It would be better to have some hard numbers behind what
DL> this change does.

DL> AoE using normal Ethernet frames end up having a protocol efficiency of
DL> only 89.82% which on a 1Gb Ethernet would give you a theoretical maximum
DL> throughput of ~112 MB/s. Going up to a 9000 byte frame bumps the efficiency
DL> to 98.68% and a theoretical max throughput of ~123 MB/s. Something
DL> interesting about jumbo frames though is that it ends up being able to
DL> request 17 sectors of data per request.

DL> Why is this interesting? Because on some Linux systems, a page size is 4096
DL> or 8 sectors so the 17 sectors works out to 2 full pages plus touching into
DL> another page. If you are not using direct IO but instead letting Linux
DL> manage the underlying file system then it would seem like you will end up
DL> making unaligned IO requests of the system causing additional I/Os to be
DL> issued. This might be the reason for the latency affects and it would be
DL> interesting to get the numbers that Catalin may have in his tests... I
DL> wouldn't mind seeing results for 17, 16, 8 sector count requests.

DL> But what I don't understand is that if the throughput is 80 MB/s and drops
DL> to 60 MB/s as Catalin suggests then I don't get how a 20 MB/s drop in
DL> throughput would make the system be more responsive ... I also don't
DL> understand what the test setup would be to even measure the affects of
DL> latency, throughput and having it correlate to responsiveness?

DL> David

-- 
Best regards,
 Killer{R}                            mailto:support@xxxxxxxxxxxx

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck&#174;
Code Sight&#153; - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss