Re: Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jerome and all folks,

In addition to my last reply, I just wanna get some more information regarding this on the upstream side.

1. Do you think this(standardize a way to report GPU private allocations) is going to be a useful thing on the upstream as well? It grants a lot benefits for Android, but I'd like to get an idea for the non-Android world.

2. There might be some worries that upstream kernel driver has no idea regarding the API. However, to achieve good fidelity around memory reporting, we'd have to pass down certain metadata which is known only by the userland. Consider this use case: on the upstream side, freedreno for example, some memory buffer object(BO) during its own lifecycle could represent totally different things, and kmd is not aware of that. When we'd like to take memory snapshots at certain granularity, we have to know what that buffer represents so that the snapshot can be meaningful and useful.

If we just keep this Android specific, I'd worry some day the upstream has standardized a way to report this and Android vendors have to take extra efforts to migrate over. This is one of the main reasons we'd like to do this on the upstream side. 

Timeline wise, Android has explicit deadlines for the next release and we have to push hard towards those. Any prompt responses are very much appreciated!

Best regards,
Yiwei

On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang <zzyiwei@xxxxxxxxxx> wrote:
On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse <jglisse@xxxxxxxxxx> wrote:
On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
> Hi folks,
>
> This is the plain text version of the previous email in case that was
> considered as spam.
>
> --- Background ---
> On the downstream Android, vendors used to report GPU private memory
> allocations with debugfs nodes in their own formats. However, debugfs nodes
> are getting deprecated in the next Android release.

Maybe explain why it is useful first ?

Memory is precious on Android mobile platforms. Apps using a large amount of
memory, games, tend to maintain a table for the memory on different devices with
different prediction models. Private gpu memory allocations is currently semi-blind
to the apps and the platform as well.

By having the data, the platform can do:
(1) GPU memory profiling as part of the huge Android profiler in progress.
(2) Android system health team can enrich the performance test coverage.
(3) We can collect filed metrics to detect any regression on the gpu private memory
allocations in the production population.
(4) Shell user can easily dump the allocations in a uniform way across vendors.
(5) Platform can feed the data to the apps so that apps can do memory allocations
in a more predictable way.
 
>
> --- Proposal ---
> We are taking the chance to unify all the vendors to migrate their existing
> debugfs nodes into a standardized sysfs node structure. Then the platform
> is able to do a bunch of useful things: memory profiling, system health
> coverage, field metrics, local shell dump, in-app api, etc. This proposal
> is better served upstream as all GPU vendors can standardize a gpu memory
> structure and reduce fragmentation across Android and Linux that clients
> can rely on.
>
> --- Detailed design ---
> The sysfs node structure looks like below:
> /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name>
> e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a node
> having the comma separated size values: "4096,81920,...,4096".

How does kernel knows what API the allocation is use for ? With the
open source driver you never specify what API is creating a gem object
(opengl, vulkan, ...) nor what purpose (transient, shader, ...).

Oh, is this a hard requirement for the open source drivers to not bookkeep any
data from userland? I think the API is just some additional metadata passed down.
 

> For the top level root, vendors can choose their own names based on the
> value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu driver
> cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd KMDs.
> (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or
> "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name
> under /sys/devices/ is already created and used for other purposes.

On one side you want to standardize on the other you want to give
complete freedom on the top level naming scheme. I would rather see a
consistent naming scheme (ie something more restraint and with little
place for interpration by individual driver)

Thanks for commenting on this. We definitely need some suggestions on the root
directory. In the multi-gpu case on desktop, is there some existing consumer to
query "some data" from all the GPUs? How does the tool find all GPUs and
differentiate between them? Is this already standardized?

> For the 2nd level "pid", there are usually just a couple of them per
> snapshot, since we only takes snapshot for the active ones.

? Do not understand here, you can have any number of applications with
GPU objects ? And thus there is no bound on the number of PID. Please
consider desktop too, i do not know what kind of limitation android
impose.

We are only interested in tracking *active* GPU private allocations. So yes, any
application currently holding an active GPU context will probably has a node here.
Since we want to do profiling for specific apps, the data has to be per application
based. I don't get your concerns here. If it's about the tracking overhead, it's rare
to see tons of application doing private gpu allocations at the same time. Could
you help elaborate a bit?

> For the 3rd level "type_name", the type name will be one of the GPU memory
> object types in lower case, and the value will be a comma separated
> sequence of size values for all the allocations under that specific type.
>
> We especially would like some comments on this part. For the GPU memory
> object types, we defined 9 different types for Android:
> (1) UNKNOWN // not accounted for in any other category
> (2) SHADER // shader binaries
> (3) COMMAND // allocations which have a lifetime similar to a
> VkCommandBuffer
> (4) VULKAN // backing for VkDeviceMemory
> (5) GL_TEXTURE // GL Texture and RenderBuffer
> (6) GL_BUFFER // GL Buffer
> (7) QUERY // backing for query
> (8) DESCRIPTOR // allocations which have a lifetime similar to a
> VkDescriptorSet
> (9) TRANSIENT // random transient things that the driver needs
>
> We are wondering if those type enumerations make sense to the upstream side
> as well, or maybe we just deal with our own different type sets. Cuz on the
> Android side, we'll just read those nodes named after the types we defined
> in the sysfs node structure.

See my above point of open source driver and kernel being unaware
of the allocation purpose and use.

Cheers,
Jérôme


Many thanks for the reply!
Yiwei
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux