Search Postgresql Archives

Re: Reg: Help to understand the source code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/23/20 8:44 AM, Preethi S wrote:
I am fairly new to postgres and I am trying to understand how the data is processed during the insert from buffer to the disk. Can someone help me with that? Also, I would like to see source code workflow. Can someone help me with finding the source code for the data insertion/modification workflow.

I'm also a Postgres hacker newbie, but I've spent some time adding SQL:2011 FOR PORTION OF support to UPDATE/DELETE, so I've gone through that learning process. (I should say "going through". :-)

I'd say be prepared to spend a *lot* of time reading the code. Personally I use `grep -r` a lot and just read and read. For specifics you can use a debugger or insert `ereport(NOTICE, (errmsg("something %s", foo)))` and run queries (or the test suite). Also many subfolders have an extensive README that will guide you. Some of the READMEs may take an hour or more to get through and understand, but reading them is worth it.

It helped me a lot to spend several years writing occasional Postgres C extensions before really doing anything in the core codebase. There are lots of basics you learn that way. There are a bunch of articles and presentations out there about that you might find helpful.

Postgres processes queries in several steps:

- parse
- analyze
- rewrite
- plan
- optimize
- execute

The parse step is a bison grammar (look for gram.y). Basically it fills in structs cutting up what the user typed.

The analyze step starts to make sense of the parse results. Look at parser/analyze.c. It maps input strings to database objects---for example looking up table/column names (and making sure they really exist). Here you're sort of just copying things from the parse structs to different structs. You're building up Node trees that later steps can use. I think the analyze step is often considered to be still part of the parse phase.

It seems like each SQL "clause" has its own transformFoo function, so probably you'll want to add your own (transformMyAwesomeFeatureClause) and then call it from its "parent" (e.g. transformUpdateStmt).

If you add new Node types you'll need to edit nodes/*funcs.c and also probably teach some switch statements how to handle them. If you are filling in a struct but then later in the pipeline find that what you wrote isn't there anymore, you probably forgot to implement a copy function.

The rewrite/plan/optimize steps aren't things you need to worry about too much if you're interested in DML, but you can read more about them in the source code. Especially rewrite is pretty niche (views and RULEs).

The execute step is the most challenging I think. It has its own Node trees and also keeps an execution state. Probably you'll need to look at src/backend/executor/nodeModifyTable.c among others. You'll also need to learn about TupleTableSlots. (If anyone here has a good learning resource for TTS I would also be glad to read it.)

I'm afraid this description is comically dumbed down, but hopefully it can be something like a map. I'd probably just take an UPDATE statement and try to trace it through the pipeline, and maybe experiment with small changes along the way. You can add things to src/test/regress as you go.

And the mailing list is a very friendly place to ask questions.

Yours,

--
Paul              ~{:-)
pj@xxxxxxxxxxxxxxxxxxxxxxxx





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux