RE: DomDocument - a parsing question [php can do it better than Perl]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello dear List 


i have a Problem with the Parsing of a html-document: 
I tried to run the following perl parser script on the HTML further below... 

but i was not lucky - so now i want to try it with PHP - i head about DomDocument - this should save my backside 
i have to get involved with Domdocument


here the full story - and my trials in PERL...: 


#!/usr/bin/perl

use strict; use warnings;
use HTML::TableExtract;
use YAML;


my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1, br_translate => 0 ); 

$table->parse($html);
foreach my $row ($table->rows) 

sub cleanup {
 for ( @_ ) {
 s/\s+//;
 s/[\xa0 ]+\z//;
 s/\s+/ /g;
 }
}

{ print join("\t", @$row), "\n"; }


Well - friends - now i will try this with PHP . Any idas or assets of "sharing" this or that!??!!
And i head bout DomDocument - 


I want to aks all Experts here i need to swithc from HTML to PHP. 


Regarding the above mentioned issue: I am not able to figure out how to use the columns method on the below HTML-file:My intuition makes me think it should be something like the following (but my intuition is wrong): foreach my $column ($table->columns) { print join("\t", @$column), "\n"; }

The HTML::TableExtract-documentation doesn't shed much light (for me anyway). I can see in the code of the module that the columns method belongs to HTML::TableExtract::Table, but I can't figure out how to use it. I appreciate any help.

Background: 
I try to get the table extracted and I have a very very small document of tables that i want to parse with this 
(HTML::TableExtract) module 
I am trying to search for keywords in the HTML - so that i can takte them for the attribs
I have to print only the necessary data.

I tried going CPAN but could not really find how to search through it for particular keywords.
One way to do it would be HTML::TableExtract - the other  way would be to parse with HTML :: TokeParser
I have very little experience with HTML :: TokeParser

Well - one or the other way i need to do this parsing.: 

i want to output the result of the parsed tables into some .text - or even better store it into a database. 
The problem here:: is I cant find anyway to search through the resulting parsed table and get necessary data.
thanks for the reply I appreciate 







##### the code: #####

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css">
<title>Weitere Schulinformationen</title>
</head>
<body class="bodyclass">
<div style="text-align:center;"><center>
<!-- <fieldset><legend> general information  </legend>
-->
<br/>
<table border="1" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_result_tab_info'>
<!-- <table border="0" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_search_info'>
-->  
 <tr>
 <td width="100%" colspan="2" class="ldstabTitel"><strong>data_one </strong></td>
 </tr>
 <tr>
 <td width="27%"><strong>data_two</strong></td>
 <td width="73%">nbsp;116439
 </td>
 </tr>
 <tr>
 <td width="27%"><strong>official_description</strong></td>
 <td width="73%">the name </td>
 </tr>
 <tr>
 <td width="27%"><strong>name of the street</strong></td>
 <td width="73%">champs elysee</td>
 </tr>
 <tr>
 <td width="27%"><strong>number and town</strong></td>
 <td width="73%"> 75000 paris </td>
 </tr>
 <tr>
 <td width="27%"><strong>telefon</strong></td>

 <td width="73%">nbsp;000241 49321
</td>
 </tr>
 <tr>
 <td width="27%"><strong>fax</strong></td>
 <td width="73%">nbsp;000241 4093287
</td>
 </tr>
 <tr>
 <td width="27%"><strong>e-mail-adresse</strong></td>
 <td width="73%">Â<a href=mailto:1111116439@xxxxxxxxxxxxx>1222216439@xxxxxxxx</a>
</td>
 </tr>
 <tr>
 <td width="27%"><strong>internet-site</strong></td>
 <td width="73%">Â<a href=http://www.thesite.org>http://www.thesite.org</td>
 </tr>
<!--  
<tr>
 <td width="27%">nbsp;</td>
 <td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? print $SCHULNR ?>" target="_blank">
 [Schuldaten;</a>
</tr>
</td> -->
<tr>
 <td width="27%">bsp;</td>
 <td width="73%">the department</td>
 </tr>    

 <tr>
 <td width="100%" colspan=2><strong>nbsp;</strong></td>
 </tr> 
 <tr>
 <td width="27%"><strong>number of indidviduals</strong></td>
 <td width="73%">nbsp;1y92</td>
<tr>
 <td width="100%" colspan=2><strong>Â</strong></td>
 </tr>
 <!-- if (!fsp.isEmpty()){
 ztext = "nbsp;";
 
 int i = 0;
 Iterator it = fsp.iterator();
 while (it.hasNext()){
 String[] zwert = new String[2];
 zwert = (String[])it.next();
 
 if (i==0){
 if (zwert[1].equals("0")){
 ztext = ztext+zwert[0];
 }else{
 ztext = ztext+zwert[0]+" mit "+zwert[1];
 if (zwert[1].equals("1")){
 ztext = ztext+" Schuuml;ler";
 }else{
 ztext = ztext+" Schuuml;lern";
 }
 }    
 i++;
 }else{
 if (zwert[1].equals("0")){
 ztext = ztext+"<br>nbsp;"+zwert[0];
 }else{
 ztext = ztext+"<br>nbsp;"+zwert[0]+" mit "+zwert[1];
 if (zwert[1].equals("1")){
 ztext = ztext+" Schuuml;ler";
 }else{
 ztext = ztext+" Schuuml;lern";
 }
 }    
 }        
 } 
-->
</table>
<!--  </fieldset>  -->
<br>

</body>
</html>

I look forwar to hear from you...

regards martin
___________________________________________________________
WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt auch mit 
gratis Notebook-Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux