Hi On Mon, Nov 23, 2020 at 2:23 PM Takashi Iwai <tiwai@xxxxxxx> wrote: > > On Sat, 21 Nov 2020 10:40:04 +0100, > Michael Nazzareno Trimarchi wrote: > > > > Hi all > > > > I'm trying to figure out how to increase performance on audio reading > > using the mmap interface. Right now what I understand it's that > > allocation comes from core/memalloc.c ops that allocate the memory for > > dma under driver/dma. > > The reference platform I have is an imx8mm and the allocation in arm64 is: > > > > 0xffff800011ff5000-0xffff800012005000 64K PTE RW NX SHD > > AF UXN MEM/NORMAL-NC > > > > This is the reason that is allocated for dma interface. > > > > Now access linear on the multichannel interface the performance is bad > > but worse if I try to access a channel a time on read. > > So it looks like it is better to copy the block using memcpy on a > > cached area and then operate on a single channel sample. If it's > > correct what I'm saying the mmap_begin and mmap_commit > > basically they don't do anything on cache level so the page mapping > > and way is used is always the same. Can the interface be modified to > > allow cache the area during read and restore in the commit > > phase? > > The current API of the mmap for the sound ring-buffer is designed to > allow concurrent accesses at any time in the minimalistic kernel-user > context switching. So the whole buffer is allocated as coherent and > mmapped in a shot. It's pretty efficient for architectures like x86, > but has disadvantages on ARM, indeed. Each platform e/o architecture can specialize the mmap and declare the area that is consistent in dma to me mapped as no cache one vma->vm_page_prot = pgprot_cached(vma->vm_page_prot); return remap_pfn_range(vma, vma->vm_start, vma->vm_end - vma->vm_start, vma->vm_page_prot); I have done it for testing purposes. This give an idea - read multi channel not sequentially took around 12% of the cpu with mmap interface - read multi channel use after a memcpy took around 6% - read on a cached area took around 3%. I'm trying to figure out how and when invalidate the area I have two use cases: - write on the channels (no performance issue) - read on channels Before reading I should only say that the cached area is not in sync with memory. I think that supporting write use cases makes little sense here. > > The mmap_begin and mmap_commit are the concepts in the alsa-lib side > for supporting the plugins better, and they doesn't represent kernel > ABI. So, this extension would be needed at first, and the memory > allocation mechanism has to be changed as well. Last but not least, Are you sure about memory allocation, or just memory mapping? > the concurrency has to be reconsidered if this approach is taken. > Yes I know that is a big problem anyway. I don't have a big idea how solve it Michael > That said, it's possible in theory, but practically no trivial task. > > > thanks, > > Takashi -- Michael Nazzareno Trimarchi Amarula Solutions BV COO Co-Founder Cruquiuskade 47 Amsterdam 1018 AM NL T. +31(0)851119172 M. +39(0)3479132170 [`as] https://www.amarulasolutions.com