Hello David, Monday, July 14, 2014, 1:23:56 AM, you wrote: IMHO problem caused mostly not only by page size, but also by HDD's sector size. Nowadays HDDs has 4K physical sector size. However they support accessing by 512 bytes, but this is ineffective, cause every unaligned read access that doesnt fit into 4K sector will resulted into 4K read, and every unaligned write - will cause disk to read sector's data, modify it internally in buffers and the write it back. Sure, firmware tries to do this in fastest way, but my tests shows that there'is about 20..30% sequential write speed degradation (with O_DIRECT) on writing 4K blocks if begining of each block is not aligned to 4K too. So simple using jumbo frames is not enough to make hardware work as fast as it can. AoE protocol doesn't support 4K sectors directly, cause it should support 'normal' MTU, but not only jumbo frames. However its theoretically possible to make initiator report OS that its '4K sector drive' and proper ('4K sector aware' :) ) OS will then access it by 4K-aligned portions, that together with some buffering at target's side should make it all work faster :). But its all looks like a tricky workaround. DL> So I do find it interesting to have a configuration to limit the size of DL> the read/write request but it seems like it would be useful to understand DL> the side affects on why someone would want to do this. Catalin suggested DL> that reducing the size of the jumbo frames decreases latency and improves DL> boot-times and said that the system "feels more response". This is were I DL> have a problem though because something "feeling" more responsive is not DL> very satisfying. It would be better to have some hard numbers behind what DL> this change does. DL> AoE using normal Ethernet frames end up having a protocol efficiency of DL> only 89.82% which on a 1Gb Ethernet would give you a theoretical maximum DL> throughput of ~112 MB/s. Going up to a 9000 byte frame bumps the efficiency DL> to 98.68% and a theoretical max throughput of ~123 MB/s. Something DL> interesting about jumbo frames though is that it ends up being able to DL> request 17 sectors of data per request. DL> Why is this interesting? Because on some Linux systems, a page size is 4096 DL> or 8 sectors so the 17 sectors works out to 2 full pages plus touching into DL> another page. If you are not using direct IO but instead letting Linux DL> manage the underlying file system then it would seem like you will end up DL> making unaligned IO requests of the system causing additional I/Os to be DL> issued. This might be the reason for the latency affects and it would be DL> interesting to get the numbers that Catalin may have in his tests... I DL> wouldn't mind seeing results for 17, 16, 8 sector count requests. DL> But what I don't understand is that if the throughput is 80 MB/s and drops DL> to 60 MB/s as Catalin suggests then I don't get how a 20 MB/s drop in DL> throughput would make the system be more responsive ... I also don't DL> understand what the test setup would be to even measure the affects of DL> latency, throughput and having it correlate to responsiveness? DL> David -- Best regards, Killer{R} mailto:support@xxxxxxxxxxxx ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck® Code Sight™ - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/aoetools-discuss