xmltv2vdr speedup and modification

sebastien.lucas at gmail.com (Sebastien Lucas) · Tue Feb 13 13:32:21 2007

On 2/12/07, jori.hamalainen@xxxxxxxxxxxxxxx
<jori.hamalainen@xxxxxxxxxxxxxxx> wrote:
> > I recently worked on xmltv2vdr.pl (version 1.0.6) and checked
> > why it was so slow on my mighty Celeron 233. So I modified it
> > a little to avoid reading all the xmltv file for each channel
> > defined in the channels.conf. The result is good : I can process
> > my 5Mo xmltv file in less than 10 minutes whereas it took at least
> > 1 hour with vanilla 1.0.6 release.
>
> Something more what you can do (just by looking source you provided)..
>
> --
>
> Caching of &xmltime2vdr, like
>
>     return $timecache{$xmltime}->{$skew} if defined $timecache{$xmltime}->{$skew}
>     $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60;
>     $timecache{$xmltime}->{$skew} = $secs
>     return secs;
>
> But it depends on how much this function is called.. But hash lookup is probably
> faster than running UnixDate from library. So it is a memory tradeoff.

I still haven't tested it but I doubt it'll help ..... I'll check later.

> I see that there is still some basic Perl based optimizations for this.
>
> For example there is browsing through @xmllines array, and every iteration
> you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
> And if one recompile takes 1ms -> you waste time @xmllines * 1ms just for
> compiling and not doing anything usefull.
>
> Perl switch "o" is recompile once flag, use that everywhere where it is
> possible. Variable is not a problem unless variable changes in every iteration.
>
[....]

I didn't know that (I'm not really a perl guru ... far from it). I'll
update my version. But it didn't help at all with my benchmark.

>
> --
>
> As there is many times $xmlline is matched with regexps etc. You should experiment
> with "study $xmlline;" after chomp $xmlline. Study makes internal search tables
> for string matches. So see which way the code is faster, with study or without
> study. Use Unix shell's time-command for this. For extra boost with study you
> probably would need to take away subroutine "xmltvtranslate" as for it $xmlline
> is copied to subroutine's parameter space, and what is matched. And study would
> not affect it. So instead of calling "$xmlline=xmltvtranslate($xmlline);" cut&paste
> subroutines code here, and use $xmlline instead of $line.
>
>     foreach $xmlline (@xmllines)
>     {
>         chomp $xmlline;
>         study $xmlline;
>         $xmlline=~s/und uuml;/?/go;
>         $xmlline...
>
> This isn't pretty but could probably help a bit. You save time for @xmllines times calling
> subroutine, and study would help you a lot as you use the same string all the time.
>

I'll check that later.

>
> For constant string you could use ' ' instead of " ". " causes string to be
> evaluated for variables
>
> if ( $chanCur eq "" ) --> if ( $chanCur eq '' )
>
> But this would be very minor effect..
>

I'll surely be too lazy to test that. sorry.

>
> Split is heavy operation because of creating arrays, but you can limit it.
>
> ( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline);
>
> => ( $null, $xmlst, $null, $xmlet, $null ) = split(/\"/, $xmlline, 5);
>
> or even using regexp for this. I don't know input line for this, but if it is
> foo,"something","something",...
>
> ($xmlst,$xmlet) = $xmlline =~ m:\"(.*?)\",\"(.*?)\":o;
>
> or probably combine 2 regexp to a single
>
> ($xmlst,$xmlet,$channel) = $xmlline =~ m:\"(.*?)\",\"(.*?)\".*?channel=\"(.*?)\":o;
>
> --
>
> Again something very weird:
>
>         if ( ($xmlline =~ /\<title/ ) )
>         {
>             #print $xmlline . "\n";
>             ( $null, $tmp ) = split(/\>/, $xmlline);
>             ( $vdrtitle, @null ) = split(/\</, $tmp);
>
>             # Send VDR Title
>
>             SVDRPsend("T $vdrtitle");
>         }
>
> Why not?
>
>         SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o;
>
> Same for XML subtitle
>         SVDRPsend("T $1") if $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o;

Yes I'll also prefer shorter code. I'll check further if something
like <title lang="en"> is also allowed to adapt the regex. For
information that change has no impact on my bench.

> Generally
>        if ( ($xmlline =~ /\<desc/ ) && ( $desccount == $dc ))
>         {
>             ( $null, $tmp ) = split(/\>/, $xmlline);
>             ( $vdrdesc, @null ) = split(/\</, $tmp);
>
> this is not a clever way to parse XML data in Perl. Just us regexp's which
> match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.
>

Agree. I'll try to modify it.

>
> Some logical errors
>
> if ( ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) && ( $chanevent == 0 ) )
>
> =>         if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) )
>
> so program execution can skip if $chanevent != 0 much faster.
> So Regexp would not be ran. This is normal short circuit operation.
>

In fact the check $chanevent == 0 is only usefull if the xml is not
well formed so it doesn't change anything.

> Then
>             elsif ( $chanCur ne $chan )
>             {
>                 SVDRPsend("c");
>                 SVDRPsend(".");
>                 SVDRPreceive(250);
>
> I think programmer wanted outout of "." -command, and see if it's status is 250?
> But now I think code is checking status of "c" -command? As socket is not read
> between calls, and there should be data in buffer for c-command. But I cannot be
> sure as I don't know SVDRP command that well.
>
> > c
> < 250 foo
> < 250-foo
> > .
> < 354-not ok
>
> It could still succeed if from socket buffer "250-" is read. Also the 2 substr calls
> in SVDRPreceive is a bit weird, but I am uncertain if regexp would help that. At least
> change "-" to '-'.
>
I have no answer about this as I didn't modify that code ..... it
seems to work so for now I don't touch it.
>
> There is a LOT to improve, but if you do these, you Celeron will fly. I hope you'll get
> to minute scale (or even better). Look with "time xmltv2vdr" to see how much process
> time is used for user code and how much for kernel code. And to see if optimizations help.
>
> Have fun.. :)

Thanks a lot for your hints.

Sebastien