Re: fast read of binary data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mo, 2012-11-12 at 12:18 +0100, Albe Laurenz wrote:
> Eildert Groeneveld wrote:
> > I am currently implementing using a compressed binary storage scheme
> > genotyping data. These are basically vectors of binary data which may be
> > megabytes in size.
> > 
> > Our current implementation uses the data type bit varying.
> > 
> > What we want to do is very simple: we want to retrieve such records from
> > the database and transfer it unaltered to the client which will do
> > something (uncompressing) with it. As massive amounts of data are to be
> > moved, speed is of great importance, precluding any to and fro
> > conversions.
> > 
> > Our current implementation uses Perl DBI; we can retrieve the data ok,
> > but apparently there is some converting going on.
> > 
> > Further, we would like to use ODBC from Fortran90 (wrapping the
> > C-library)  for such transfers. However, all sorts funny things happen
> > here which look like conversion issues.
> > 
> > In old fashioned network database some decade ago (in pre SQL times)
> > this was no problem. Maybe there is someone here who knows the PG
> > internals sufficiently well to give advice on how big blocks of memory
> > (i.e. bit varying records) can between transferred UNALTERED between
> > backend and clients.
> 
> Using the C API you can specify binary mode for your data, which
> meand that they won't be converted.
> 
> I don't think you will be able to use this with DBI or ODBC,
> but maybe binary corsors can help
> (http://www.postgresql.org/docs/current/static/sql-declare.html),
> but I don't know if DBI or ODBC handles them well.
> 
> If you can avoid DBI or ODBC, that would be best.
ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the  PQexecParams

    res = PQexecParams(conn,
                       "DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data",
                       0,       /* zero param */
                       NULL,    /* let the backend deduce param type */
                       paramValues,
                       NULL,    /* don't need param lengths since text*/
                       NULL,    /* default to all text params */
                       1);      /* ask for binary results */

genotype_bits is defined as bit varying in the backend. When writing the
results:
    for (i = 0; i < PQntuples(res); i++)
    {
        for (j = 0; j < nFields; j++)
      fwrite(PQgetvalue(res, i, j),100000,1,f);
    }

it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
0000000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060

clearly, these are nice 0 and 1 in ASCII and not as I need it as a bit
stream. 

Also, (and in line with this) PQgetvalue(res, i, j) seems to be of type
text.

What am I missing?






-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux