On Wed, 16 Jan 2019 22:08:35 +0000, Sam Patterson <katoriasdev@xxxxxxxxx> wrote: >Hi all, > >I've recently started developing an extension for Postgres for which I'll >need to create a new variable-length base type. The type will require a >tree-like structure in order to parse sufficiently, which of course >probably means having some sort of recursive data structure, like a struct >that has members which are pointers to itself for child nodes. After doing >some research, specifically looking at how other variable-length data types >store their data, it seems almost all of them store the data in a binary >representation, using bit masks and offsets etc in order to store/access >the data whilst having an in-memory representation that's used to >manipulate the data. > >I presume the purpose for using this approach is because all the data in a >varlena type has to be contiguous, and the moment you start using pointers >this is no longer possible. So my question is, given a structure that looks >something like this, > >typedef struct Node >{ > char *data; > Node *left; > Node *right; >} Node; > >am I right in saying that I wouldn't be able to store that representation >on-disk, but instead I'd have to transform it into some binary >representation and back again when writing/reading respectively, are there >any alternatives? > >Regards, > >Karl You might want to consider using a linear tree representation that can be searched directly. See: https://en.wikipedia.org/w/index.php?title=Binary_tree§ion=9#Arrays E.g., Make the tree nodes: { unsigned data, left, right } Store the nodes contiguously as an array, and represent the left/right branches using simple array indexing. Write the node data contiguously into a separate array/stream, keeping an offset [or index] to each data element in its correponding tree node. And then write the node array and the data array/stream contiguously onto the disk. For reading/searching this structure can be used directly: read in the homogenous node array, and suck the variable length data stream into a separate buffer. If it isn't important that the tree be optimally ordered, then insert is simple: just append a new node to the array, append its data to the data stream, and update branch indexes to link the new node at the correct spot in the tree. If you want the data to be as small as possible on disk, then update or delete of a node requires repacking the data and adjusting the data offsets of the "unchanged" nodes. https://en.wikipedia.org/wiki/Mark-compact_algorithm [Obviously, you can delete without repacking by just unlinking the tree node and rewriting the node array. The "deleted" node data is left in place on disk but it becomes unreachable.] If you can afford to (temporarily) sort the tree nodes by their data offset, repacking the data can be done in the same buffer by sliding each element forward to be contiguous with the previous. If you need to take the nodes in whatever order they occur, then to pack the data you need to copy it to a new buffer. Hope this helps. George