On Wed, Jan 11, 2023 at 2:56 PM Daniel Vetter <daniel@xxxxxxxx> wrote: > > On Mon, Jan 09, 2023 at 04:18:12PM -0800, Shakeel Butt wrote: > > Hi T.J., > > > > On Mon, Jan 9, 2023 at 1:38 PM T.J. Mercier <tjmercier@xxxxxxxxxx> wrote: > > > > > > Based on discussions at LPC, this series adds a memory.stat counter for > > > exported dmabufs. This counter allows us to continue tracking > > > system-wide total exported buffer sizes which there is no longer any > > > way to get without DMABUF_SYSFS_STATS, and adds a new capability to > > > track per-cgroup exported buffer sizes. The total (root counter) is > > > helpful for accounting in-kernel dmabuf use (by comparing with the sum > > > of child nodes or with the sum of sizes of mapped buffers or FD > > > references in procfs) in addition to helping identify driver memory > > > leaks when in-kernel use continually increases over time. With > > > per-application cgroups, the per-cgroup counter allows us to quickly > > > see how much dma-buf memory an application has caused to be allocated. > > > This avoids the need to read through all of procfs which can be a > > > lengthy process, and causes the charge to "stick" to the allocating > > > process/cgroup as long as the buffer is alive, regardless of how the > > > buffer is shared (unless the charge is transferred). > > > > > > The first patch adds the counter to memcg. The next two patches allow > > > the charge for a buffer to be transferred across cgroups which is > > > necessary because of the way most dmabufs are allocated from a central > > > process on Android. The fourth patch adds a SELinux hook to binder in > > > order to control who is allowed to transfer buffer charges. > > > > > > [1] https://lore.kernel.org/all/20220617085702.4298-1-christian.koenig@xxxxxxx/ > > > > > > > I am a bit confused by the term "charge" used in this patch series. > > From the patches, it seems like only a memcg stat is added and nothing > > is charged to the memcg. > > > > This leads me to the question: Why add this stat in memcg if the > > underlying memory is not charged to the memcg and if we don't really > > want to limit the usage? > > > > I see two ways forward: > > > > 1. Instead of memcg, use bpf-rstat [1] infra to implement the > > per-cgroup stat for dmabuf. (You may need an additional hook for the > > stat transfer). > > > > 2. Charge the actual memory to the memcg. Since the size of dmabuf is > > immutable across its lifetime, you will not need to do accounting at > > page level and instead use something similar to the network memory > > accounting interface/mechanism (or even more simple). However you > > would need to handle the reclaim, OOM and charge context and failure > > cases. However if you are not looking to limit the usage of dmabuf > > then this option is an overkill. > > I think eventually, at least for other "account gpu stuff in cgroups" use > case we do want to actually charge the memory. > Yes, I've been looking at this today. > The problem is a bit that with gpu allocations reclaim is essentially "we > pass the error to userspace and they get to sort the mess out". There are > some exceptions (some gpu drivers to have shrinkers) would we need to make > sure these shrinkers are tied into the cgroup stuff before we could enable > charging for them? > I'm also not sure that we can depend on the dmabuf being backed at export time 100% of the time? (They are for dmabuf heaps.) If not, that'd make calling the existing memcg folio based functions a bit difficult. > Also note that at least from the gpu driver side this is all a huge > endeavour, so if we can split up the steps as much as possible (and get > something interim useable that doesn't break stuff ofc), that is > practically need to make headway here. TJ has been trying out various > approaches for quite some time now already :-/ > -Daniel > > > Please let me know if I misunderstood something. > > > > [1] https://lore.kernel.org/all/20220824233117.1312810-1-haoluo@xxxxxxxxxx/ > > > > thanks, > > Shakeel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch