On Mon, Aug 3, 2020 at 4:06 AM Robin Murphy <robin.murphy@xxxxxxx> wrote: > > On 2020-07-29 06:16, John Stultz wrote: > > This adds a heap that allocates non-contiguous buffers that are > > marked as writecombined, so they are not cached by the CPU. > > ... > > + ret = sg_alloc_table(new_table, table->nents, GFP_KERNEL); > > + if (ret) { > > + kfree(new_table); > > + return ERR_PTR(-ENOMEM); > > + } > > + > > + new_sg = new_table->sgl; > > + for_each_sg(table->sgl, sg, table->nents, i) { > > Consider using the new sgtable helpers that are just about to land - in > this case, for_each_sgtable_sg(). Ack! Thanks for the suggestion! > > + memcpy(new_sg, sg, sizeof(*sg)); > > + new_sg->dma_address = 0; > > This seems a little bit hairy, as in theory a consumer could still treat > a nonzero DMA length as the address being valid. Rather than copying the > whole entry then trying to undo parts of that, maybe just: > > sg_set_page(new_sg, sg_page(sg), sg->len, sg->offset); > > ? Sounds good. > > +static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment, > > + enum dma_data_direction direction) > > +{ > > + struct dma_heap_attachment *a = attachment->priv; > > + struct sg_table *table = a->table; > > + > > + if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents, direction, > > + DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WRITE_COMBINE)) > > dma_map_sgtable() > > Also, DMA_ATTR_WRITE_COMBINE is meaningless for streaming DMA. > Hrm. Ok, my grasp of "streaming" vs "consistent" definitions are maybe slightly off, since while we are mapping and unmapping buffers, the point of this heap is that the allocated memory is uncached/coherent, so we avoid the cache sync overhead on each mapping/unmapping, which I thought was closer to the "consistent" definition. But maybe if the mapping and unmapping part is really the key difference, then ok. Either way, from my testing, you seem to be right that the ATTR_WRITE_COMBINE doesn't seem to make any difference in behavior. > > + pgprot = pgprot_writecombine(PAGE_KERNEL); > > + > > + for_each_sg(table->sgl, sg, table->nents, i) { > > for_each_sg_page() Ack. > > + /* > > + * XXX This is hackish. While the buffer will be uncached, we need > > + * to initially flush cpu cache, since the the __GFP_ZERO on the > > + * allocation means the zeroing was done by the cpu and thus it is likely > > + * cached. Map & flush it out now so we don't get corruption later on. > > + * > > + * Ideally we could do this without using the heap device as a dummy dev. > > + */ > > + dma_map_sg_attrs(dma_heap_get_dev(heap), table->sgl, table->nents, > > + DMA_BIDIRECTIONAL, DMA_ATTR_WRITE_COMBINE); > > Again, DMA_ATTR_WRITE_COMBINE is meaningless here. > > > + dma_sync_sg_for_device(dma_heap_get_dev(heap), table->sgl, table->nents, > > + DMA_BIDIRECTIONAL); > > This doesn't do anything that the map hasn't already just done. Good point! > > + } > > + dma_heap_get_dev(heap->heap)->dma_mask = &dummy_mask; > > + dma_set_mask(dma_heap_get_dev(heap->heap), DMA_BIT_MASK(64)); > > Much as I'd hate to encourage using dma_coerce_mask_and_coherent(), I'm > not sure this is really any better :/ Sounds fair. Thanks so much for the review, I really appreciate the feedback! (Sorry I was a little slow to respond. The merge window has really been something this cycle.. :P) I'll get this all integrated and resend the patch here shortly. thanks -john