On Wed, Feb 1, 2023 at 6:52 AM Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > > > On 01/02/2023 14:23, Tvrtko Ursulin wrote: > > > > On 01/02/2023 01:49, T.J. Mercier wrote: > >> On Tue, Jan 31, 2023 at 6:01 AM Tvrtko Ursulin > >> <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > >>> > >>> > >>> On 25/01/2023 20:04, T.J. Mercier wrote: > >>>> On Wed, Jan 25, 2023 at 9:31 AM Tvrtko Ursulin > >>>> <tvrtko.ursulin@xxxxxxxxxxxxxxx> wrote: > >>>>> > >>>>> > >>>>> Hi, > >>>>> > >>>>> On 25/01/2023 11:52, Michal Hocko wrote: > >>>>>> On Tue 24-01-23 19:46:28, Shakeel Butt wrote: > >>>>>>> On Tue, Jan 24, 2023 at 03:59:58PM +0100, Michal Hocko wrote: > >>>>>>>> On Mon 23-01-23 19:17:23, T.J. Mercier wrote: > >>>>>>>>> When a buffer is exported to userspace, use memcg to attribute the > >>>>>>>>> buffer to the allocating cgroup until all buffer references are > >>>>>>>>> released. > >>>>>>>> > >>>>>>>> Is there any reason why this memory cannot be charged during the > >>>>>>>> allocation (__GFP_ACCOUNT used)? > >>>>>>>> Also you do charge and account the memory but underlying pages > >>>>>>>> do not > >>>>>>>> know about their memcg (this is normally done with commit_charge > >>>>>>>> for > >>>>>>>> user mapped pages). This would become a problem if the memory is > >>>>>>>> migrated for example. > >>>>>>> > >>>>>>> I don't think this is movable memory. > >>>>>>> > >>>>>>>> This also means that you have to maintain memcg > >>>>>>>> reference outside of the memcg proper which is not really nice > >>>>>>>> either. > >>>>>>>> This mimicks tcp kmem limit implementation which I really have > >>>>>>>> to say I > >>>>>>>> am not a great fan of and this pattern shouldn't be coppied. > >>>>>>>> > >>>>>>> > >>>>>>> I think we should keep the discussion on technical merits instead of > >>>>>>> personal perference. To me using skmem like interface is totally > >>>>>>> fine > >>>>>>> but the pros/cons need to be very explicit and the clear reasons to > >>>>>>> select that option should be included. > >>>>>> > >>>>>> I do agree with that. I didn't want sound to be personal wrt tcp kmem > >>>>>> accounting but the overall code maintenance cost is higher because > >>>>>> of how tcp take on accounting differs from anything else in the memcg > >>>>>> proper. I would prefer to not grow another example like that. > >>>>>> > >>>>>>> To me there are two options: > >>>>>>> > >>>>>>> 1. Using skmem like interface as this patch series: > >>>>>>> > >>>>>>> The main pros of this option is that it is very simple. Let me > >>>>>>> list down > >>>>>>> the cons of this approach: > >>>>>>> > >>>>>>> a. There is time window between the actual memory allocation/free > >>>>>>> and > >>>>>>> the charge and uncharge and [un]charge happen when the whole > >>>>>>> memory is > >>>>>>> allocated or freed. I think for the charge path that might not be > >>>>>>> a big > >>>>>>> issue but on the uncharge, this can cause issues. The application > >>>>>>> and > >>>>>>> the potential shrinkers have freed some of this dmabuf memory but > >>>>>>> until > >>>>>>> the whole dmabuf is freed, the memcg uncharge will not happen. > >>>>>>> This can > >>>>>>> consequences on reclaim and oom behavior of the application. > >>>>>>> > >>>>>>> b. Due to the usage model i.e. a central daemon allocating the > >>>>>>> dmabuf > >>>>>>> memory upfront, there is a requirement to have a memcg charge > >>>>>>> transfer > >>>>>>> functionality to transfer the charge from the central daemon to the > >>>>>>> client applications. This does introduce complexity and avenues > >>>>>>> of weird > >>>>>>> reclaim and oom behavior. > >>>>>>> > >>>>>>> > >>>>>>> 2. Allocate and charge the memory on page fault by actual user > >>>>>>> > >>>>>>> In this approach, the memory is not allocated upfront by the central > >>>>>>> daemon but rather on the page fault by the client application and > >>>>>>> the > >>>>>>> memcg charge happen at the same time. > >>>>>>> > >>>>>>> The only cons I can think of is this approach is more involved > >>>>>>> and may > >>>>>>> need some clever tricks to track the page on the free patch i.e. > >>>>>>> we to > >>>>>>> decrement the dmabuf memcg stat on free path. Maybe a page flag. > >>>>>>> > >>>>>>> The pros of this approach is there is no need have a charge transfer > >>>>>>> functionality and the charge/uncharge being closely tied to the > >>>>>>> actual > >>>>>>> memory allocation and free. > >>>>>>> > >>>>>>> Personally I would prefer the second approach but I don't want to > >>>>>>> just > >>>>>>> block this work if the dmabuf folks are ok with the cons > >>>>>>> mentioned of > >>>>>>> the first approach. > >>>>>> > >>>>>> I am not familiar with dmabuf internals to judge complexity on > >>>>>> their end > >>>>>> but I fully agree that charge-when-used is much more easier to reason > >>>>>> about and it should have less subtle surprises. > >>>>> > >>>>> Disclaimer that I don't seem to see patches 3&4 on dri-devel so > >>>>> maybe I > >>>>> am missing something, but in principle yes, I agree that the 2nd > >>>>> option > >>>>> (charge the user, not exporter) should be preferred. Thing being > >>>>> that at > >>>>> export time there may not be any backing store allocated, plus if the > >>>>> series is restricting the charge transfer to just Android clients then > >>>>> it seems it has the potential to miss many other use cases. At least > >>>>> needs to outline a description on how the feature will be useful > >>>>> outside > >>>>> Android. > >>>>> > >>>> There is no restriction like that. It's available to anybody who wants > >>>> to call dma_buf_charge_transfer if they actually have a need for that, > >>>> which I don't really expect to be common since most users/owners of > >>>> the buffers will be the ones causing the export in the first place. > >>>> It's just not like that on Android with the extra allocator process in > >>>> the middle most of the time. > >>> > >>> Yeah I used the wrong term "restrict", apologies. What I meant was, if > >>> the idea was to allow spotting memory leaks, with the charge transfer > >>> being optional and in the series only wired up for Android Binder, then > >>> it obviously only fully works for that one case. So a step back.. > >>> > >> Oh, spotting kernel memory leaks is a side-benefit of accounting > >> kernel-only buffers in the root cgroup. The primary goal is to > >> attribute buffers to applications that originated them (via > >> per-application cgroups) simply for accounting purposes. Buffers are > >> using memory on the system, and we want to know who created them and > >> how much memory is used. That information is/will no longer available > >> with the recent deprecation of the dmabuf sysfs statistics. > >> > >>> .. For instance, it is not feasible to transfer the charge when dmabuf > >>> is attached, or imported? That would attribute the usage to the > >>> user/importer so give better visibility on who is actually causing the > >>> memory leak. > >>> > >> Instead of accounting at export, we could account at attach. That just > >> turns out not to be very useful when the majority of our > >> heap-allocated buffers don't have attachments at any particular point > >> in time. :\ But again it's less about leaks and more about knowing > >> which buffers exist in the first place. > >> > >>> Further more, if above is feasible, then could it also be implemented in > >>> the common layer so it would automatically cover all drivers? > >>> > >> Which common layer code specifically? The dmabuf interface appears to > >> be the most central/common place to me. > > > > Yes, I meant dma_buf_attach / detach. More below. > >>>>> Also stepping back for a moment - is a new memory category really > >>>>> needed, versus perhaps attempting to charge the actual backing store > >>>>> memory to the correct client? (There might have been many past > >>>>> discussions on this so it's okay to point me towards something in the > >>>>> archives.) > >>>>> > >>>> Well the dmabuf counter for the stat file is really just a subcategory > >>>> of memory that is charged. Its existence is not related to getting the > >>>> charge attributed to the right process/cgroup. We do want to know how > >>>> much of the memory attributed to a process is for dmabufs, which is > >>>> the main point of this series. > >>> > >>> Then I am probably missing something because the statement how proposal > >>> is not intended to charge to the right process, but wants to know how > >>> much dmabuf "size" is attributed to a process, confuses me due a seeming > >>> contradiction. And the fact it would not be externally observable how > >>> much of the stats is accurate and how much is not (without knowing the > >>> implementation detail of which drivers implement charge transfer and > >>> when). Maybe I completely misunderstood the use case. > >>> > >> Hmm, did I clear this up above or no? The current proposal is for the > >> process causing the export of a buffer to be charged for it, > >> regardless of whatever happens afterwards. (Unless that process is > >> like gralloc on Android, in which case the charge is transferred from > >> gralloc to whoever called gralloc to allocate the buffer on their > >> behalf.) > > > > Main problem for me is that charging at export time has no relation to > > memory used. But I am not familiar with the memcg counters to know if > > any other counter sets that same precedent. If all other are about real > > memory use then IMO this does not fit that well. I mean specifically this: > > > > + dmabuf (npn) > > + Amount of memory used for exported DMA buffers allocated by the > > cgroup. > > + Stays with the allocating cgroup regardless of how the buffer > > is shared. > > + > > > > I think that "Amount of memory used for exported..." is not correct. As > > implemented it is more akin the virtual address space size in the cpu > > space - it can have no relation to the actual usage since backing store > > is not allocated until the attachment is made. > > > > Then also this: > > > > @@ -446,6 +447,8 @@ struct dma_buf { > > struct dma_buf *dmabuf; > > } *sysfs_entry; > > #endif > > + /* The cgroup to which this buffer is currently attributed */ > > + struct mem_cgroup *memcg; > > }; > > > > Does not conceptually fit in my mind. Dmabufs are not associated with > > one cgroup at a time. > > > > So if you would place tracking into dma_buf_attach/detach you would be > > able to charge to correct cgroup regardless of a driver and since by > > contract at this stage there is backing store, the reflected memory > > usage counter would be truthful. > > > > But then you state a problem, that majority of the time there are no > > attachments in your setup, and you also say the proposal is not so much > > about leaks but more about knowing what is exported. > > > > In this case you could additionally track that via dma_buf_getfile / > > dma_buf_file_release as a separate category like dmabuf-exported? But > > again, I personally don't know if such "may not really be using memory" > > counters fit in memcg. > > > > (Hm you'd probably still need dmabuf->export_memcg to store who was the > > original caller of dma_buf_getfile, in case last reference is dropped > > from a different process/context. Even dmabuf->attach_memcg for > > attach/detach to work correctly for the same reason.) > > Or to work around the "may not really be using memory" problem with the > exported tracking, perhaps you could record dmabuf->export_memcg at > dma_buf_export time, but only charge against it at dma_buf_getfile time. > Assuming it is possible to keep references to those memcg's over the > dmabuf lifetime without any issues. > I don't follow here. dma_buf_export calls dma_buf_getfile. Did you mean dma_buf_attach / dma_buf_mmap instead of dma_buf_getfile? If so that's an interesting idea, but want to make sure I'm tracking correctly. > That way we could have dmabuf-exported and dmabuf-imported memcg > categories which would better correlate with real memory usage. I say > better, because I don't think it would still be perfect since individual > drivers are allowed to hold onto the backing store post detach and that > is invisible to dmabuf API. But that probably is a different problem. > Oh, that sounds... broken. > Regards, > > Tvrtko