On Fri, 2018-12-07 at 19:51 +0100, Emiliano Ingrassia wrote: > Hi Carlo, Hi Emiliano, > tests[0] conducted on an Odroid-C1+ board equipped with a Meson8b SoC > have shown an high packet loss (90% and more) during a simple ping > test from a laptop to the board. > Testing the two patches separately clearly showed that this depends > on the > removal of the "eee-broken-1000t" flag from the board PHY description > in the relative device tree. > > About the first patch (MAC IRQ type), no tests have shown an evidence > that it is needed. I suggest you to conduct some test on real > hardware > as I do to confirm or disprove my tests. Let's try to step back a bit and see what we can do to clarify this situation. First of all for arm64 we are pretty sure that both patches are needed because we ran extensive and lengthy tests, especially regarding the change in the IRQ trigger type. For arm things are not so clear, so for now we decided to merge the arm64 patch and just wait on the arm one. First of all we can focus on the patch regarding the change in the IRQ type. The problem with the IRQ type is triggered on the arm64 boards we tested using the script in [0]. If we run this stress test on the arm64 boards without the trigger changing patch after a few hours (variable from 2h to 6h sometimes more) we can see the connection dropping from ~1Gbps to <30Mbps. Jerome gave a nice explanation of the why, but after changing the IRQ trigger type we couldn't see the issue anymore. This was confirmed not just by BayLibre but also from other different sources, so we are pretty confident in this solution. So my first two points for you to answer are: 1) Can you reproduce this problem on your board without the patches when running this script? 2) If yes, does only the first patch solve the problem? This brings us to the second issue, the one regarding the 'eee-broken- 1000t' quirk. Since the two issues are strictly related we are confident that the change in the IRQ type solves this problem as well (and this was confirmed by Jerome as well on the arm64 boards). For this case I cannot provide a real reproducer so we need only to stress test the network with iperf3 trying to reproduce the issue. This is also because we think that you approach of using UDP and your packet generator probably is not the best way to test the patch given that (1) using UDP is not reliable according to our tests, (2) there is an asymmetry in TX/RX, (3) the packet loss could be due to the saturation on the bandwidth, etc... So AFAIK the best way to test this problem is using iperf3, the same way it is done in the script in [0]. I was not involved with this issue 1 year and half ago but AFAIK this is the way it was reproduced. This brings me to more answers for you to answer: 3) Running iperf3 tests in TX / RX / TX+RX without the 'eee-broken- 1000' quirk applied are you able to reproduce the EEE problem? 4) Any change when the 'eee-broken-1000' quirk is applied? When testing (3) and (4) also please check the status of the EEE using ethtool. Hopefully this will bring a bit of clarity to the whole situation :) Cheers, [0] https://paste.fedoraproject.org/paste/GBFxjAQ0JULsYQlyYO2KOw -- Carlo Caione