Hi, I'm not sure how on-topic this is on this list, but I have a question regarding a device driver design issue. For our Bachelor's project my team and I are tasked to optimize an existing hardware solution. The design utilizes an FPGA to accomplish various tasks, including a Triple Speed Ethernet controller that is linked to the CPU via PCI express. Currently the implementation is fairly naive, and the driver just does byte-by-byte reads directly from a FIFO on the FPGA device. This, of course, is quite resource intensive and basically hogs up the CPU completely (leading throughput to peak at around 10 Mbit/s). Our plan to solve this problem is as follows: * Keep a buffer on the FPGA that retains a number of Ethernet packages. * Once a certain threshold is reached (or a period of time, e.g. 5ms, elapses), the buffer is flushed and sent directly to RAM via DMA. * When the buffer is flushed and the data is in RAM and accessible by the CPU, the device raises an interrupt, signalling the CPU to read the data. * In the interrupt handler, we `memcpy` the individual packets to another buffer and hand them to the upper layer in the network stack. Our rationale behind keeping a buffer of packets rather than just transmitting a single packet is to maximize the amount of data send with each PCIe transaction (and in turn minimize the overhead). However, upon reading a relevant LDD chapter [1] (which, admittedly, we should have done in the first place), we found that the authors of the book take a different approach: > The second case comes about when DMA is used asynchronously. This happens, > for example, with data acquisition devices that go on pushing data even if > nobody is reading them. In this case, the driver should maintain a buffer so > that a subsequent read call will return all the accumulated data to user > space. The steps involved in this kind of transfer are slightly different: > 1. The hardware raises an interrupt to announce that new data has arrived. > 2. The interrupt handler allocates a buffer and tells the hardware where to transfer > its data. > 3. The peripheral device writes the data to the buffer and raises another interrupt > when it’s done. > 4. The handler dispatches the new data, wakes any relevant process, and takes care > of housekeeping. > A variant of the asynchronous approach is often seen with network cards. These > cards often expect to see a circular buffer (often called a DMAring buffer) established > in memory shared with the processor; each incoming packet is placed in the > next available buffer in the ring, and an interrupt is signaled. The driver then passes > the network packets to the rest of the kernel and places a new DMA buffer in the > ring. Now, there are some obvious advantages to this method (not the least of which being that it's much easier to implement), but I can't help but feel like it would be a little inefficient. Now here's my question: Is our solution to this problem sane? Do you think it would be viable or that it would create more issues than it would solve? Should we go the LDD route instead and allocate a new buffer everytime an interrupt is raised? Thanks for your help! -- Regards, Christoph [1] https://static.lwn.net/images/pdf/LDD3/ch15.pdf (page 30) _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies