On 4/22/21 10:06 AM, Mike Rapoport wrote: > On Wed, Apr 21, 2021 at 05:35:57PM +0000, Peter.Enderborg@xxxxxxxx wrote: >> On 4/21/21 5:31 PM, Mike Rapoport wrote: >>> On Wed, Apr 21, 2021 at 10:37:11AM +0000, Peter.Enderborg@xxxxxxxx wrote: >>>> On 4/21/21 11:15 AM, Daniel Vetter wrote: >>>>> We need to understand what the "correct" value is. Not in terms of kernel >>>>> code, but in terms of semantics. Like if userspace allocates a GL texture, >>>>> is this supposed to show up in your metric or not. Stuff like that. >>>> That it like that would like to only one pointer type. You need to know what >>>> >>>> you pointing at to know what it is. it might be a hardware or a other pointer. >>>> >>>> If there is a limitation on your pointers it is a good metric to count them >>>> even if you don't know what they are. Same goes for dma-buf, they >>>> are generic, but they consume some resources that are counted in pages. >>>> >>>> It would be very good if there a sub division where you could measure >>>> all possible types separately. We have the detailed in debugfs, but nothing >>>> for the user. A summary in meminfo seems to be the best place for such >>>> metric. >>> >>> Let me try to summarize my understanding of the problem, maybe it'll help >>> others as well. >> Thanks! >> >> >>> A device driver allocates memory and exports this memory via dma-buf so >>> that this memory will be accessible for userspace via a file descriptor. >>> >>> The allocated memory can be either allocated with alloc_page() from system >>> RAM or by other means from dedicated VRAM (that is not managed by Linux mm) >>> or even from on-device memory. >>> >>> The dma-buf driver tracks the amount of the memory it was requested to >>> export and the size it sees is available at debugfs and fdinfo. >>> >>> The debugfs is not available to user and maybe entirely disabled in >>> production systems. >>> >>> There could be quite a few open dma-bufs so there is no overall summary, >>> plus fdinfo in production systems your refer to is also unavailable to the >>> user because of selinux policy. >>> >>> And there are a few details that are not clear to me: >>> >>> * Since DRM device drivers seem to be the major user of dma-buf exports, >>> why cannot we add information about their memory consumption to, say, >>> /sys/class/graphics/drm/cardX/memory-usage? >> Android is using it for binder that connect more or less everything >> internally. > > Ok, then it rules out /sys/class/graphics indeed. > >>> * How exactly user generates reports that would include the new counters? >>> From my (mostly outdated) experience Android users won't open a terminal >>> and type 'cat /proc/meminfo' there. I'd presume there is a vendor agent >>> that collects the data and sends it for analysis. In this case what is >>> the reason the vendor is unable to adjust selinix policy so that the >>> agent will be able to access fdinfo? >> When you turn on developer mode on android you can use >> usb with a program called adb. And there you get a normal shell. >> >> (not root though) >> >> There is applications that non developers can use to get >> information. It is very limited though and there are API's >> provide it. >> >> >>> * And, as others already mentioned, it is not clear what are the problems >>> that can be detected by examining DmaBufTotal except saying "oh, there is >>> too much/too little memory exported via dma-buf". What would be user >>> visible effects of these problems? What are the next steps to investigate >>> them? What other data will be probably required to identify root cause? >>> >> When you debug thousands of devices it is quite nice to have >> ways to classify what the problem it is not. The normal user does not >> see anything of this. However they can generate bug-reports that >> collect information about as much they can. Then the user have >> to provide this bug-report to the manufacture or mostly the >> application developer. And when the problem is >> system related we need to reproduce the issue on a full >> debug enabled unit. > So the flow is like this: > > * a user has a problem and reports it to an application developer; at best > the user runs simple and limited app to collect some data > * if the application developer considers this issue as a system related > they can open adb and collect some more information about the system > using non-root shell with selinux policy restrictions and send this > information to the device manufacturer. > * the manufacturer continues to debug the issue and at this point as much > information is possible would have been useful. > > In this flow I still fail to understand why the manufacturer cannot provide > userspace tools that will be able to collect the required information. > These tools not necessarily need to target the end user, they may be only > intended for the application developers, e.g. policy could allow such tool > to access some of the system data only when the system is in developer > mode. > The manufacture is trying to get the tool to work. This is what the patch is about. Even for a application developer a commercial phone is locked down. Many vendors allow that you flash some other software like a AOSP. But that can be very different. Like installing a ubuntu on a PC to debug a Fedora issue. And sure we can pickup parts of what using the dma-buf. But we can not get the total and be sure that is the total without a proper counter.