Re: PostgreSQL's XML support comparison against other

SCassidy@xxxxxxxxxxxxxxxxxxx · Wed, 29 Mar 2006 12:34:53 -0800

If you have just a column containing a chunk of xml, and want to transform
it into individual columns, such that each element/attribute is its own
column, it should be pretty easy.  Here's a simple perl example:

                            Table "public.testxml1"
 Column  |  Type   |                         Modifiers
---------+---------+-----------------------------------------------------------
 id1     | integer | not null default
nextval('public.testxml1_id1_seq'::text)
 xmltext | text    |

Contains:
 id1 |       ltext
-----+---------------------------------------------------
   1 | <animal>
    <type>dog</type>
    <breed>Cocker Spaniel</breed>
    <color>black</color>
    <sex>male</sex>
    <stock_number>123456</stock_number>
    <price currency="US dollars">100.00</price>
  </animal>
   2 | <animal>
    <type>dog</type>
    <breed>Toy poodle</breed>
    <color>white</color>
    <sex>female</sex>
    <stock_number>234567</stock_number>
    <price currency="US dollars">120.00</price>
  </animal>
   3 | <animal>
    <type>dog</type>
    <breed>Chihuahua</breed>
    <color>tan</color>
    <sex>male</sex>
    <stock_number>345678</stock_number>
    <price currency="US dollars">125.00</price>
  </animal>
(3 rows)

Create new table, with a column per element/attribute:

create table flatdata (
  id              serial primary key,
  type            varchar(50),
  breed           varchar(50),
  color           varchar(50),
  sex             varchar(10),
  stock_number    varchar(30),
  price           numeric (8,2),
  currency        varchar(30)
);

Run the perl program (below) against the original xml data, and get new
table:

 id | type |     breed      | color |  sex   | stock_number | price  |
currency
----+------+----------------+-------+--------+--------------+--------+------------
  7 | dog  | Cocker Spaniel | black | male   | 123456       | 100.00 | US
dollars
  8 | dog  | Toy poodle     | white | female | 234567       | 120.00 | US
dollars
  9 | dog  | Chihuahua      | tan   | male   | 345678       | 125.00 | US
dollars
(3 rows)

#=========================  perl program:  ====================
#!/usr/local/bin/perl

use DBI;
use XML::Twig;
our $dbh;
our $sth;
our $sth2;
$DEBUG=1;

my $dbname='mytestdb1';
my $dbport=5432;

$dbuser="testuser";

#=============== connect to database =============================
$dbh = DBI->connect("dbi:Pg:dbname=$dbname", $dbuser, "", {PrintError => 0,
PrintWarn => 0}) or
 errexit( "Unable to connect to dbname $dbname, err: $DBI::errstr");

print "Connect done\n";

my (@data);

my $twig=new XML::Twig(
  keep_spaces=> 1,
  twig_roots => {
         'animal' => \&process_entry }
);

my $stmt=qq(select xmltext from testxml1);
execute_db_statement($stmt, __LINE__);

my $insert=<<"EOF";
 INSERT into flatdata (type, breed, color, sex, stock_number, price,
currency)
  values (?, ?, ?, ?, ?, ?, ?)
EOF

our $cnt=0;

#prepare the insert statement, with placeholders:
$sth2=$dbh->prepare($insert) ||
    errexit("bad prepare for stmt $insert, error: $DBI::errstr");

#fetch each xml row, transform into individual columns:
while (@data = $sth->fetchrow_array) {
  foreach (@data) { $_='' unless defined}
  next if ($data[0] eq '');
  $twig->parse($data[0]);
}

#check for problems with premature termination
errexit($sth->errstr) if $sth->err;

sub process_entry {
  my ($t, $element)=@_;

  #process elements under the 'animal' element:

  my $type=$element->first_child('type');
  my $type_text;
  if ($type) {
    $type_text=$type->text();
  }

  my $breed=$element->first_child('breed');
  my $breed_text;
  if ($breed) {
    $breed_text=$breed->text();
  }

  my $color=$element->first_child('color');
  my $color_text;
  if ($color) {
    $color_text=$color->text();
  }

  my $sex=$element->first_child('sex');
  my $sex_text;
  if ($sex) {
    $sex_text=$sex->text();
  }

  my $stock_number=$element->first_child('stock_number');
  my $stock_number_text;
  if ($stock_number) {
    $stock_number_text=$stock_number->text();
  }

  my $price=$element->first_child('price');
  my ($price_text, $price_currency);
  if ($price) {
    $price_text=$price->text();
    $price_currency=$price->{'att'}->{'currency'}; #currency attribute
  }

  $twig->flush;

  my $rc2=$sth2->execute($type_text, $breed_text, $color_text, $sex_text,
$stock_number_text,
        $price_text, $price_currency) ||
    die("can't insert data for item: $type, $breed, $color, $stock_number,
DB error: $DBI::errstr");

} # end sub process_entry

sub errexit {
  my (@msg)=@_;
  print @msg,"\n";
  exit 1;
}

sub execute_db_statement {
  #this subroutine will prepare and execute a statement for the database,
  # and errexit if it fails either step
  my ($statement, $lineno)=@_;
  my ($rc);
  $sth=$dbh->prepare($statement) ||
    errexit("bad prepare for stmt $statement at line $lineno, error:
$DBI::errstr");
  $rc=$sth->execute() ||
    errexit("can't execute statement:\n$statement\n at line $lineno, ",
     "return code $rc: DB error: $DBI::errstr");
} # end sub execute_db_statement

#=======================================  end perl pgm
============================

Easy!  It could be fancied up, of course, this was quick and dirty.

Susan

                           "Ted Byers"                                                                                                    
                      <r.ted.byers@xxxxxxxxxx        To:       "pgsql general" <pgsql-general@xxxxxxxxxxxxxx>                             
                      >                              cc:                                                                                  
                           Sent by:                  Subject:  Re: [GENERAL] PostgreSQL's XML support comparison against other RDBMSes    

                                                      |-------------------|                                                               
                      pgsql-general-owner@pos         | [ ] Expand Groups |                                                               
                      tgresql.org                     |-------------------|                                                               

                           03/29/2006 10:24                                                                                               
                      AM                                                                                                                  

May I ask a question about this?

I will be working on an older database in which the original developer
stored XML documents as a single variable length text field.  To process
it,
it has to be retrieved in full and parsed.  But the structure of it is
simple in that it has an element for each field in the DB that replaced the

earily version.  But people are still using the earlier one because they
still need access to the old data and no tool has yet been written by my
predecessors to move the old data over to the new DB.  Does the XML support

you're all talking about make it less tedious to develop tools to parse
these old XML files and put their data into the right field of the right
table?  I can develop the tool our users need using the resources I have at

hand, but the proces of parsing these XML files is certain to be very
tedious; something I am not looking forward to.  There is a reason I don't
do much with XML even though I know how.

Ted
----- Original Message -----
From: "Steve Atkins" <steve@xxxxxxxxxxx>
To: "pgsql general" <pgsql-general@xxxxxxxxxxxxxx>
Sent: Wednesday, March 29, 2006 12:21 PM
Subject: Re: [GENERAL] PostgreSQL's XML support comparison against other
RDBMSes

>
> On Mar 29, 2006, at 2:57 AM, Ivan Zolotukhin wrote:
>
>> On 3/29/06, Peter Eisentraut <peter_e@xxxxxxx> wrote:
>>> Ivan Zolotukhin wrote:
>>>> BTW, are there any plans for its improvement? What are the first
>>>> steps for the people who want to contribute to PostgreSQL in this
>>>> field?
>>>
>>> The first step would be making a detailed analysis of what "XML
>>> support"
>>> is supposed to mean.
>>
>> So let's make such analysis here within PostgreSQL community! There
>> were a lot of talks about XML support in Postgres, but they did not
>> lead to any steady work in this direction. IMHO, it's time to start it
>> now, no?
>
> The people who want it should explain what it is, ideally by
> pointing at the relevant standards documents.
>
> Cheers,
>   Steve
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

----------------------------------------------------------------------------------------------
Simply protected storage solutions ensure that your information is
automatically safe, readily available and always there, visit us at http://www.overlandstorage.com
----------------------------------------------------------------------------------------------