On 1/12/2021 8:25 PM, Tomasz Figa wrote: > On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: >> >> >> >> On 1/12/2021 6:29 PM, Tomasz Figa wrote: >>> Hi Florian, >>> >>> On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: >>>> >>>> On 1/11/21 11:48 PM, Claire Chang wrote: >>>>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: >>>>>> >>>>>> On 1/7/21 9:42 AM, Claire Chang wrote: >>>>>> >>>>>>>> Can you explain how ATF gets involved and to what extent it does help, >>>>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does >>>>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access >>>>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is >>>>>>>> still some sort of basic protection that the HW enforces, right? >>>>>>> >>>>>>> We need the ATF support for memory MPU (memory protection unit). >>>>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory >>>>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to >>>>>>> that specific regions. >>>>>> >>>>>> OK so you do have a protection unit of some sort to enforce which region >>>>>> in DRAM the PCIE bridge is allowed to access, that makes sense, >>>>>> otherwise the restricted DMA region would only be a hint but nothing you >>>>>> can really enforce. This is almost entirely analogous to our systems then. >>>>> >>>>> Here is the example of setting the MPU: >>>>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 >>>>> >>>>>> >>>>>> There may be some value in standardizing on an ARM SMCCC call then since >>>>>> you already support two different SoC vendors. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Broadcom STB SoCs we have had something similar for a while however >>>>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a >>>>>>>> basic protection mechanism whereby we can configure a region in DRAM to >>>>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe >>>>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not >>>>>>>> allowed access to DRAM so we must call into a security agent to allow >>>>>>>> the PCIe bridge to access the designated DRAM region. >>>>>>>> >>>>>>>> We have done this using a private CMA area region assigned via Device >>>>>>>> Tree, assigned with a and requiring the PCIe EP driver to use >>>>>>>> dma_alloc_from_contiguous() in order to allocate from this device >>>>>>>> private CMA area. The only drawback with that approach is that it >>>>>>>> requires knowing how much memory you need up front for buffers and DMA >>>>>>>> descriptors that the PCIe EP will need to process. The problem is that >>>>>>>> it requires driver modifications and that does not scale over the number >>>>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no >>>>>>>> need to bounce buffer. Your approach scales better across PCIe EP >>>>>>>> drivers however it does require bounce buffering which could be a >>>>>>>> performance hit. >>>>>>> >>>>>>> Only the streaming DMA (map/unmap) needs bounce buffering. >>>>>> >>>>>> True, and typically only on transmit since you don't really control >>>>>> where the sk_buff are allocated from, right? On RX since you need to >>>>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate >>>>>> them from a pool that already falls within the restricted DMA region, right? >>>>>> >>>>> >>>>> Right, but applying bounce buffering to RX will make it more secure. >>>>> The device won't be able to modify the content after unmap. Just like what >>>>> iommu_unmap does. >>>> >>>> Sure, however the goals of using bounce buffering equally applies to RX >>>> and TX in that this is the only layer sitting between a stack (block, >>>> networking, USB, etc.) and the underlying device driver that scales well >>>> in order to massage a dma_addr_t to be within a particular physical range. >>>> >>>> There is however room for improvement if the drivers are willing to >>>> change their buffer allocation strategy. When you receive Wi-Fi frames >>>> you need to allocate buffers for the Wi-Fi device to DMA into, and that >>>> happens ahead of the DMA transfers by the Wi-Fi device. At buffer >>>> allocation time you could very well allocate these frames from the >>>> restricted DMA region without having to bounce buffer them since the >>>> host CPU is in control over where and when to DMA into. >>>> >>> >>> That is, however, still a trade-off between saving that one copy and >>> protection from the DMA tampering with the packet contents when the >>> kernel is reading them. Notice how the copy effectively makes a >>> snapshot of the contents, guaranteeing that the kernel has a >>> consistent view of the packet, which is not true if the DMA could >>> modify the buffer contents in the middle of CPU accesses. >> >> I would say that the window just became so much narrower for the PCIe >> end-point to overwrite contents with the copy because it would have to >> happen within the dma_unmap_{page,single} time and before the copy is >> finished to the bounce buffer. > > Not only. Imagine this: > > a) Without bouncing: > > - RX interrupt > - Pass the packet to the network stack > - Network stack validates the packet > - DMA overwrites the packet > - Network stack goes boom, because the packet changed after validation > > b) With bouncing: > > - RX interrupt > - Copy the packet to a DMA-inaccessible buffer > - Network stack validates the packet > - Network stack is happy, because the packet is guaranteed to stay the > same after validation Yes that's a much safer set of operations, thanks for walking through a practical example. -- Florian