Search Postgresql Archives

DBMirror.pl performance change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I discovered a problem in DBMirror.pl, performance wise.

pending.c stores data in a way
very similar to the PgSQL input "\" escaped format.

When the field is of type bytea, and the source of data is binary, then
this produces 2 additional backslashes for every unprintable
char.

The performance in function extractData in DBMirror.pl, really suffers
from this condition, since it breaks data in chunks of "\" delimited
strings.

Informally speaking, performance tends to be O(n) where n is the size
of the data.

This can be remedied if we break data in chunks of "'" rather than "\".
"'" happens much more infrequently in common binary files (bz2, tiff, jpg, 
pdf etc..), and if we notice that odd number of contained "\", signals an
intermidiate "'", whereas even number of "\" signals the final "'", 
then we can make this routine run much faster.

I attach the new extractData function.

Now replicating a 400 k tiff takes 3 seconds instead of 12 minutes
it used to do.

I am wondering about the state of 
http://www.whitebeam.org/library/guide/TechNotes/replicate.rhtm

Please feel free for any comments.

Pete could you test this new DBMirror.pl, to see how it behaves
in comparison with your C++ solution?

-- 
-Achilleus
sub extractData($$) {
  my $pendingResult = $_[0];
  my $currentTuple = $_[1];
  my $fnumber;
  my %valuesHash;
  $fnumber = 4;
  my $dataField = $pendingResult->getvalue($currentTuple,$fnumber);
  my $numofbs;

  while(length($dataField)>0) {
    # Extract the field name that is surronded by double quotes
    $dataField =~ m/(\".*?\")/s;
    my $fieldName = $1;
    $dataField = substr $dataField ,length($fieldName);
    $fieldName =~ s/\"//g; #Remove the surronding " signs.

    if($dataField =~ m/(^= )/s) {
      #Matched null
	$dataField = substr $dataField , length($1);
      $valuesHash{$fieldName}=undef;
    }
    elsif ($dataField =~ m/(^=\')/s) {
      #Has data.
      my $value;
      $dataField = substr $dataField ,2; #Skip the ='
    LOOP: {  #This is to allow us to use last from a do loop.
	     #Recommended in perlsyn manpage.
      do {
	my $matchString;
	my $matchString2;
	#Find the substring ending with the first ' or first \
	$dataField =~ m/(.*?[\'])?/s; 
	$matchString = $1;

	$numofbs = ($matchString =~ tr/\\//) % 2;	

	if ($numofbs == 1) { #// odd number of \, i.e. intermediate '
		$matchString2 = substr $matchString,0, length($matchString)-2;
		$matchString2 =~ s/\\\\/\\/g;
		$value .= ($matchString2 . "\'");
		$dataField = substr $dataField,length($matchString);
	}
	else { #// even number of \, i.e. found end of data
		$matchString2 = substr $matchString,0, length($matchString)-1;
		$matchString2 =~ s/\\\\/\\/g;
		$value .= $matchString2;
		$dataField = substr $dataField,length($matchString)+1;
		last;
	}

	   
      } until(length($dataField)==0);
  }
      $valuesHash{$fieldName} = $value;
      
      
      }#else if 
	  else {
	    
	    logErrorMessage "Error in PendingData Sequence Id " .
		$pendingResult->getvalue($currentTuple,0);
	    die;
	  }
    
    
    
  } #while
  return %valuesHash;
    
}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux