On 16-11-23 02:29 PM, Mark Lord wrote:
On 16-11-23 10:12 AM, Hayes Wang wrote:
Mark Lord [mlord@xxxxxxxxx]
[...]
What does this code do:
static void r8153_set_rx_early_size(struct r8152 *tp)
{
u32 mtu = tp->netdev->mtu;
u32 ocp_data = (agg_buf_sz - mtu - VLAN_ETH_HLEN - VLAN_HLEN) / 4;
ocp_write_word(tp, MCU_TYPE_USB, USB_RX_EARLY_SIZE, ocp_data);
}
This only works for RTL8153. However, what you use is RTL8152.
It is like delay completion. It is used to reduce the loading of CPU
by letting a transfer contain more data to reduce the number of
transfers.
How is ocp_data used by the hardware?
Shouldn't the calculation also include sizeof(rx_desc) in there somewhere?
The algorithm is from our hw engineers, and it should be
(agg_buf_sz - packet size) / 8
You could refer to commit a59e6d815226 ("r8152: correct the rx early size").
Thanks.
Right now I am working quite hard trying to narrow things down exactly.
You are correct that the driver does appear to be careful about accesses
beyond the filled portion of a URB buffer -- for some reason I thought
the original driver had issues there, but looking again it does not seem to.
One idea that is now looking more likely:
Things could be suffering from speculative CPU accesses to RAM
(the system here has non-coherent d-cache/RAM).
This could incorrectly pre-load data from adjacent URB buffers
into the d-cache, creating coherency issues. I am testing now
with cacheline-sized guard zones between the buffers to see if
that is the issue or not.
Nope. Guard zones did not fix it, so it's probably not a prefetch issue.
Oddly, adding a couple of memory barriers to specific places in the driver
does help, A LOT. Still not 100%, but it did pass 1800 reboot tests over night
with only three bad rx_desc's reported.
That's a new record here for the driver using kmalloc'd buffers,
and put reliability on par with using non-cacheable buffers.
Any way we look at it though, the chip/driver are simply unreliable,
and relying upon hardware checksums (which fail due to the driver
looking at garbage rather than the checksum bits) leads to data corruption.
Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html