On 2/12/07, jori.hamalainen@xxxxxxxxxxxxxxx <jori.hamalainen@xxxxxxxxxxxxxxx> wrote: > > I recently worked on xmltv2vdr.pl (version 1.0.6) and checked > > why it was so slow on my mighty Celeron 233. So I modified it > > a little to avoid reading all the xmltv file for each channel > > defined in the channels.conf. The result is good : I can process > > my 5Mo xmltv file in less than 10 minutes whereas it took at least > > 1 hour with vanilla 1.0.6 release. > > Something more what you can do (just by looking source you provided).. > > -- > > Caching of &xmltime2vdr, like > > return $timecache{$xmltime}->{$skew} if defined $timecache{$xmltime}->{$skew} > $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60; > $timecache{$xmltime}->{$skew} = $secs > return secs; > > But it depends on how much this function is called.. But hash lookup is probably > faster than running UnixDate from library. So it is a memory tradeoff. I still haven't tested it but I doubt it'll help ..... I'll check later. > I see that there is still some basic Perl based optimizations for this. > > For example there is browsing through @xmllines array, and every iteration > you recompile *ALL* regexp's. That is as many times as @xmllines has lines. > And if one recompile takes 1ms -> you waste time @xmllines * 1ms just for > compiling and not doing anything usefull. > > Perl switch "o" is recompile once flag, use that everywhere where it is > possible. Variable is not a problem unless variable changes in every iteration. > [....] I didn't know that (I'm not really a perl guru ... far from it). I'll update my version. But it didn't help at all with my benchmark. > > -- > > As there is many times $xmlline is matched with regexps etc. You should experiment > with "study $xmlline;" after chomp $xmlline. Study makes internal search tables > for string matches. So see which way the code is faster, with study or without > study. Use Unix shell's time-command for this. For extra boost with study you > probably would need to take away subroutine "xmltvtranslate" as for it $xmlline > is copied to subroutine's parameter space, and what is matched. And study would > not affect it. So instead of calling "$xmlline=xmltvtranslate($xmlline);" cut&paste > subroutines code here, and use $xmlline instead of $line. > > foreach $xmlline (@xmllines) > { > chomp $xmlline; > study $xmlline; > $xmlline=~s/und uuml;/?/go; > $xmlline... > > This isn't pretty but could probably help a bit. You save time for @xmllines times calling > subroutine, and study would help you a lot as you use the same string all the time. > I'll check that later. > > For constant string you could use ' ' instead of " ". " causes string to be > evaluated for variables > > if ( $chanCur eq "" ) --> if ( $chanCur eq '' ) > > But this would be very minor effect.. > I'll surely be too lazy to test that. sorry. > > Split is heavy operation because of creating arrays, but you can limit it. > > ( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline); > > => ( $null, $xmlst, $null, $xmlet, $null ) = split(/\"/, $xmlline, 5); > > or even using regexp for this. I don't know input line for this, but if it is > foo,"something","something",... > > ($xmlst,$xmlet) = $xmlline =~ m:\"(.*?)\",\"(.*?)\":o; > > or probably combine 2 regexp to a single > > ($xmlst,$xmlet,$channel) = $xmlline =~ m:\"(.*?)\",\"(.*?)\".*?channel=\"(.*?)\":o; > > -- > > Again something very weird: > > if ( ($xmlline =~ /\<title/ ) ) > { > #print $xmlline . "\n"; > ( $null, $tmp ) = split(/\>/, $xmlline); > ( $vdrtitle, @null ) = split(/\</, $tmp); > > # Send VDR Title > > SVDRPsend("T $vdrtitle"); > } > > Why not? > > SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o; > > Same for XML subtitle > SVDRPsend("T $1") if $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o; Yes I'll also prefer shorter code. I'll check further if something like <title lang="en"> is also allowed to adapt the regex. For information that change has no impact on my bench. > Generally > if ( ($xmlline =~ /\<desc/ ) && ( $desccount == $dc )) > { > ( $null, $tmp ) = split(/\>/, $xmlline); > ( $vdrdesc, @null ) = split(/\</, $tmp); > > this is not a clever way to parse XML data in Perl. Just us regexp's which > match strings with Boyer-Moore algorithm (same as Unix grep) and compile once. > Agree. I'll try to modify it. > > Some logical errors > > if ( ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) && ( $chanevent == 0 ) ) > > => if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) ) > > so program execution can skip if $chanevent != 0 much faster. > So Regexp would not be ran. This is normal short circuit operation. > In fact the check $chanevent == 0 is only usefull if the xml is not well formed so it doesn't change anything. > Then > elsif ( $chanCur ne $chan ) > { > SVDRPsend("c"); > SVDRPsend("."); > SVDRPreceive(250); > > I think programmer wanted outout of "." -command, and see if it's status is 250? > But now I think code is checking status of "c" -command? As socket is not read > between calls, and there should be data in buffer for c-command. But I cannot be > sure as I don't know SVDRP command that well. > > > c > < 250 foo > < 250-foo > > . > < 354-not ok > > It could still succeed if from socket buffer "250-" is read. Also the 2 substr calls > in SVDRPreceive is a bit weird, but I am uncertain if regexp would help that. At least > change "-" to '-'. > I have no answer about this as I didn't modify that code ..... it seems to work so for now I don't touch it. > > There is a LOT to improve, but if you do these, you Celeron will fly. I hope you'll get > to minute scale (or even better). Look with "time xmltv2vdr" to see how much process > time is used for user code and how much for kernel code. And to see if optimizations help. > > Have fun.. :) Thanks a lot for your hints. Sebastien