On 6/26/23 16:52, Matthew Wilcox wrote:
On Mon, Jun 26, 2023 at 04:27:54PM +0200, Danilo Krummrich wrote:
On 6/26/23 15:19, Matthew Wilcox wrote:
On Mon, Jun 26, 2023 at 02:38:06AM +0200, Danilo Krummrich wrote:
On the other hand, unless I miss something (and if so, please let me know),
something is bogus with the API then.
While the documentation of the Advanced API of the maple tree explicitly
claims that the user of the API is responsible for locking, this should be
limited to the bounds set by the maple tree implementation. Which means, the
user must decide for either the internal (spin-) lock or an external lock
(which possibly goes away in the future) and acquire and release it
according to the rules maple tree enforces through lockdep checks.
Let's say one picks the internal lock. How is one supposed to ensure the
tree isn't modified using the internal lock with mas_preallocate()?
Besides that, I think the documentation should definitely mention this
limitation and give some guidance for the locking.
Currently, from an API perspective, I can't see how anyone not familiar with
the implementation details would be able to recognize this limitation.
In terms of the GPUVA manager, unfortunately, it seems like I need to drop
the maple tree and go back to using a rb-tree, since it seems there is no
sane way doing a worst-case pre-allocation that does not suffer from this
limitation.
I haven't been paying much attention here (too many other things going
on), but something's wrong.
First, you shouldn't need to preallocate. Preallocation is only there
Unfortunately, I think we really have a case where we have to. Typically GPU
mappings are created in a dma-fence signalling critical path and that is
where such mappings need to be added to the maple tree. Hence, we can't do
any sleeping allocations there.
OK, so there are various ways to hadle this, depending on what's
appropriate for your case.
The simplest is to use GFP_ATOMIC. Essentially, you're saying to the MM
layer "This is too hard, let me tap into the emergency reserves". It's
mildly frowned upon, so let's see if we can do better.
If you know where the allocation needs to be stored, but want it to act as
NULL until the time is right, you can store a ZERO entry. That will read
as NULL until you store to it. A pure overwriting store will not cause
any memory allocation since all the implementation has to do is change
a pointer. The XArray wraps this up nicely behind an xa_reserve() API.
As you're discovering, the Maple Tree API isn't fully baked yet.
Unfortunately, GFP_ATOMIC seems the be the only option. I think storing
entries in advance would not work. Typically userspace submits a job to
the kernel issuing one or multiple requests to map and unmap memory in
an ioctl. Such a job is then put into a queue and processed
asynchronously in a dma-fence signalling critical section. Hence, at the
we'd store entries in advance we could have an arbitrary amount of
pending jobs potentially still messing with the same address space region.
So, the only way to go seems to be to use mas_store_gfp() with
GFP_ATOMIC directly in the fence signalling critical path. I guess
mas_store_gfp() does not BUG_ON() if it can't get atomic pages?
Also, I just saw that the tree is limited in it's height
(MAPLE_HEIGHT_MAX). Do you think it could be a sane alternative to
pre-allocate with MAPLE_HEIGHT_MAX rather than to rely on atomic pages?
Or maybe a compromise of pre-allocating just a couple of nodes and then
rely on atomic pages for the rest?
FYI, we're talking about a magnitude of hundreds of thousands of entries
to be stored in the tree.
- Danilo