Re: Creating a read/write virtual table?

Ben Martin <monkeyiq@xxxxxxxxxxxxxxxxxxxxx> · Mon, 12 Apr 2010 19:52:05 +1000

On Mon, 2010-04-12 at 13:15 +0800, Craig Ringer wrote: 
> On 12/04/10 11:34, Ben Martin wrote:
> 
> >    In my SQLite vtable both of the above restrictions don't apply. To
> > mitigate (1) I was hopeful that there was a way to get the connection to
> > PG on the server to fork/setuid
> 
> The pg backends are designed to run as one user. You might have issues 
> setting it up to do otherwise.

I was suspecting this to be the case, with apache it's somewhat
different to be able to set the uid and then access the filesystem, but
trying to have multiple users + processes accessing pg data files would
be a messier thing.

> 
> Consider talking to your storage backend over a unix socket, pipe, 
> tcp/ip socket, shared memory, or other suitable IPC mechanism from Pg. 
> This also protects Pg from any issues caused by the storage backend 
> doing things like spawning unexpected threads, corrupting the 
> stack/heap, outright crashing, etc.

Hmm, this might be a good way to go anyway just for the isolation. And
actually fork/execing ferrisls --xml and massaging that as the return
value might be a fairly quick and robust initial implementation.

Thinking broader, being able to run "a program" and massage
xml/json/yaml back as the return value of a custom function might be a
handy generic thing to have. I'll have a quick search around to make
sure such a thing doesn't already exist...

> 
> >    I think using a function restricts the results to being read only.
> 
> If you're planning on writing a C-level UDF loaded via 'CREATE 
> FUNCTION', then yes it'll have to return a materialized data set and 
> there's no way to UPDATE/INSERT/DELETE against the function's results.
> 
> > Perhaps having a second function that allows updates to be processed
> > separately.
> 
> Yep, though that's clumsy I expect it'd work, and honestly it's probably 
> your best bet.

OK

> 
> >    Any pointers / suggestions on how to do this sort of thing with
> > postgresql would be wonderful. Please CC any such replies as I'm not
> > currently on list.
> 
> AFAIK Pg's innards aren't currently very friendly toward extension with 
> random user-provided table/database storage backends. Unlike
> (say) MySQL, there's no pluggable storage engine system.
> 
> That means Pg's planner/executor can make a lot of decisions based on 
> things it "knows" about how Pg's data storage works, speeding and 
> simplifying it and letting it be much smarter. Pg can enforce strict 
> rules about transactional scope, ACID, error handling, etc based on 
> knowledge of the data storage's capabilities and expectations. On the 
> other hand, it means it's hard to plug in your own storage system.

This all sounds very reasonable. Things also become a quagmire when the
database would need to ask the table implementation about virtual
indexes and their selectivity etc, etc. Once again, not too much of a
problem for smallish data sets and SQLite when a full virtual table scan
is acceptable for performance.

> 
> After all, how would your external system handle something like this:
> 
> BEGIN;
> INSERT INTO my_external_storage(a, b) VALUES (1,2);
> SAVEPOINT sp_a;
> INSERT INTO my_external_storage(a, b) VALUES (3,4);
> ROLLBACK TO SAVEPOINT sp_a;
> INSERT INTO my_external_storage(a, b) VALUES (5,6);
> COMMIT;
> 
> How would it honour SERIALIZABLE isolation if it doesn't its self 
> maintain MVCC snapshots and have some way of mapping pg's xids to its 
> internal versions? Etc.

This indeed gets very hairy. I have thought about trying to implement
transactions in the filesystem, but for some virtual filesystems it is
not really possible, especially when rollback itself could fail.

> 
> 
> Pg would have to be able to error out on many commands with "not 
> supported by underlying storage system" or the like. Yet sometimes it 
> won't even know that until it's well into executing something.
> 
> SQLite has the advantage of simplicity. It doesn't have to worry about 
> complicated concurrency rules, versioning, user-defined data type 
> storage, etc etc etc. And MySQL was designed to handle pluggable 
> backends, so it's prepared for storage backends not to be able to 
> perform certain operations, and it makes some big sacrifices in terms of 
> its broader capabilities to permit pluggable backends.
> 
> I guess at this point I have to ask "why do you want to do this? What 
> will you gain?"
> 

A few things I do with the SQLite engine are analysing apache log files,
or other separated files (like running awk with SQL as the query
language), and interacting with web services as a table, eg, selecting
data from a google spreadsheet and a local sqlite database in a single
query. As a fair amount of the use cases I have are simply introducing
and joining data from other sources into queries I think perhaps
starting out with just a record set returning function would be a good
thing.

This nicely avoids the issues that the filesystem can't really do MVCC
or transactions properly, but gives access to some more diverse data in
a query for folks who want to perform such queries.

Attachment:
signature.asc

Description: This is a digitally signed message part