Hello!
I'm working on the OneSparse Postgres extension that wraps the GraphBLAS API with a SQL interface for doing graph analytics and other sparse linear algebra operations:
OneSparse wraps the GraphBLAS opaque handles in Expanded Object Header structs that register ExpandedObjectMethods for flattening and expanding objects from their "live" handle that can be passed to the SuiteSparse API, and their "flat" representations are de/serialized and get written as TOAST values. This works perfectly.
However during some single source shortest path (sssp) benchmarking I was getting good numbers but not as good as I expected, and noticed some sublinear scaling as the problems got bigger. It seems my objects are getting constantly flattened/expanded from plpgsql during the iterative phases of an algorithm. As the solution grows the result vector gets bigger and the expand/flatten cost increases on each iteration.
I found this thread from the original path implementation from Tom Lane in 2015:
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.
Sure enough looking at the code I see this condition:
This is a showstopper for me as I can't see a good way around it, I tried to "fake" an array but didn't get too far down that approach but I may still pull it off as GraphBLAS objects are very much array-like, but I figured I'd also open the discussion on how we can fix this permanently so that future extensions don't run into this penalty.
My first thought was to add a flag to CREATE TYPE like "EXPANDED = true" or some other better name that indicates that the object can be safely taken ownership of in its expanded state and not copied. The GraphBLAS is specific in its API in that the object handle holder is the owner of the reference, so that would work fine for me. Another option I guess is some kind of whitelist or blacklist telling plpgsql which types can be kept expanded.
And then there is just removing the existing restriction on arrays only. Is any other expanded object out there really interested in being flattened/expanded over and over again?
Thanks,
-Michel