Re: How can i get record by data block not by sql?

Craig Ringer <ringerc@xxxxxxxxxxxxx> · Tue, 04 Oct 2011 21:04:21 +0800

Joining several reply threads. Replies inline below.

On 10/04/2011 05:07 PM, 姜头 wrote:
> I found the Gist is difficult to understand. :)
> I will try my best to read it.

I find GiST hard to understand too. It's probably the easiest way to add 
a custom index type, though.

I strongly recommend that you start reading here if you want to develop 
additional functionality for PostgreSQL's backends:

http://developer.postgresql.org/pgdocs/postgres/internals.html

particularly:

http://developer.postgresql.org/pgdocs/postgres/indexam.html
http://developer.postgresql.org/pgdocs/postgres/gist.html
http://developer.postgresql.org/pgdocs/postgres/storage.html

On 10/04/2011 05:01 PM, 姜头 wrote:
> I am sorry for my poor english and come from a non-english country.
> The original paper can be download this time for IEEE service is
> unavailable and i find the another paper which is sent to you as
> attached file. In section 2.1.1 I-Tree which the researcher implement
> in postgresql using *blocks.*

My Chinese [?] isn't so great either ;-) so there's no need to apologise.

A *very* quick look at the paper you sent suggests that they might've 
been working on an index-oriented table ("covering index") structure for 
data mining. That's interesting. They don't talk much about their 
implementation or publish source code, though :-(

The paper is talking about PostgreSQL blocks, ie what PostgreSQL's 
BLOCKSIZE refers to. These are (usually) 8kb chunks of files on a 
regular file system, stored within the datadir, and are accessed via 
pread() and pwrite() by the PostgreSQL backends and managed in the 
buffer cache.

(For other readers: the paper is T.SUNITHA, G.SRUJANA & P.V.RAVIKANTH, 
"IMine: Index Support for Item Set Mining". International Journal of 
Computer Trends and Technology- July to Aug Issue 2011, pp255-261. ISSN: 
2231-2803. )

On 10/04/2011 04:39 PM, 姜头 wrote:
Thank you very much.
I read the paper http://dbdmg.polito.it/twiki/bin/view/Public/IMine
again carefully and find that they don't explain clearly. I think they
say 'blocks' means 'blocks of dababase(DBMS)'. We know that dbms will
form their own blocks which is not file blocks.

Yep, I'd say so.

Actually ,they and me want to record the phycical address of data

The offset of data within a PostgreSQL database file, yes.

and
then we can form a disk-resident tree. (like tree in memory using
pointer,but this time it residents on disk.)

Yep.

I know access blocks is hard as you say. I havn't know now and rowid of
record in orcale can be used? it sounds like phycial address more.

You'd want to use the block index then the offset within the block, like 
the btree index already does. Have a look at how the btree index code works.

I don't know pread() more and i will study it now.

It turns out you don't want pread() etc anyway. You probably want to use 
PostgreSQL's own data access functions.

I was talking about pread() because that's the low level system call 
PostgreSQL uses to read its data, and it is one of the system calls you 
can use for raw I/O on device nodes. It turns out that's not what you 
want to do at all. You want to do I/O on PostgreSQL database files, you 
just want to define your own storage structure and index structure.

The first step to doing this will probably be to read the source code of 
the btree index, and read the documentation I linked to.

The single most helpful thing will probably be to get the source code of 
the sample implementation made by one of the authors of the papers 
you're interested in. I don't know if that'll be possible, but I'm sure 
it'd help a lot if you could do it.

Thank you very very much.
Best wishes.
------------------ 原始邮件 ------------------
*发件人:* "Craig Ringer"<ringerc@xxxxxxxxxxxxx>;
*发送时间:* 2011年10月4日(星期二) 下午3:15
*收件人:* "姜头"<104186179@xxxxxx>;
*主题:* Re: re：  How can i get record by data block not by sql?

"Data block" isn't a term with one fixed meaning. You could be referring
to Pg blocks, file system blocks, disk sectors, or all sorts of other
things.

Do you actually mean raw disk sectors? If so: on Linux and most other
UNIXes you can use block I/O calls to access them just like files on a
file system by opening the device node. pread and friends should be just
fine. You sound like you might want direct I/O, in which case look at
the O_DIRECT flag. There is also async I/O.

You still don't really explain WHY you want this or what you're trying
to achieve. It sounds like you're trying to avoid perceived
inefficiencies in the use of a file system, but I'm not sure. If that is
the case, beware. Going without a file system is HARD, way harder than
you think, and you will land up needing to re-invent many of the same
features. Getting allocation, readahead, caching, fragmentation, etc
right is not easy.

What might be rather interesting is a specialized file system for
databases that traded flexibility for simplicity and possibly speed.
That said, most interesting options like big allocations, big blocks,
small inode tables, cache control etc are already offered by mkfs
options for most major filesysyems like xfs, ext4, ufs and even NTFS to
an extent.

I think you will find that in the real world, raw disk I/O is way harder
than it us worth. Even Oracle seems to be moving away from it AFAIK.

On Oct 4, 2011 12:35 PM, "姜头" <104186179@xxxxxx
<mailto:104186179@xxxxxx>> wrote:
 > function pread() and rwrite actually are functions operating files by
file systems, not by data blocks directly.
 > Here are the reaserches who have implemented it in postgresql.
 > http://dbdmg.polito.it/twiki/bin/view/Public/IMine
 > They do it in postgresql and i want to improve it. But i have not
known how. :(
 >
 > I think they store data in DB blocks just for high efficency. If they
can store some data in special blocks,then they can reduce I/O.
 >
 >
 >
 >
 > ------------------ 原始邮件 ------------------
 > 发件人: "Craig Ringer"<ringerc@xxxxxxxxxxxxx
<mailto:ringerc@xxxxxxxxxxxxx>>;
 > 发送时间: 2011年10月4日(星期二) 上午10:57
 > 收件人: "104186179"<104186179@xxxxxx <mailto:104186179@xxxxxx>>;
 > 抄送: "pgsql-general"<pgsql-general@xxxxxxxxxxxxxx
<mailto:pgsql-general@xxxxxxxxxxxxxx>>;
 > 主题: Re:  How can i get record by data block not by sql?
 >
 >
 > On 03/10/11 17:03, 姜头 wrote: How can i get record by data block not
by sql?
 >
 > I want to read and write lots of data by data blocks, so i can form a
disk-resident tree by recording the block address. But i don't know how
to implement in postgresql.
 > Is there system function can do this?
 >
 >
 > It might be a good idea to take a step or three back and ask: "Why?"
 >
 > What are you trying to achieve? What is the goal?
 >
 > Is PostgreSQL the right choice? Have you looked at lower-level
databases like Berkeley DB, various raw ISAM engines, etc? For that
matter, if you want block-level operation, don't you really just want
pread() and pwrite()?
 >
 > If you want to do something within the PostgreSQL engine using your
own custom files to store data, you would have to do it by writing C
functions as server-side extensions and calling those via SQL to access
and manage your data. These functions would have to use their own
separate data; they could **NOT** safely use existing postgresql data
files in any way.
 >
 > --
 > Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general