Re: Possible extensions to OUString class

Michael Stahl <mst@xxxxxxxxxxxxxxx> · Thu, 31 Jan 2019 11:00:46 +0100

On 31.01.19 08:04, Matteo Casalin wrote:
Hi Stephan,

On 1/30/19 10:40 PM, Stephan Bergmann wrote:
On 30/01/2019 22:17, Matteo Casalin wrote:
     I'm working on improving code that calls getToken (e.g. using 
its version with index, or using other OUString functions in its 
place when possible).
One thing that I noticed is that there are a lot of calls in the form 
getToken().toInt# which require memory management just to obtain a 
value that could be generated by the original OUString. Similarly 
(but less frequently), some tokens are extracted just to compare them 
against a string, which again requires memory management that is 
really not needed.

I was wondering if extending O(U)String with functions like:

* getTokenAs[U]Int#(token, sep, index)
* matchToken(token, sep, index, string)

would be accepted/appreciated or not. At the moment I already 
submitted to gerrit a patch [1] which adds 
comphelper::string::matchToken but I think that adding such 
functionality to OUString directly would be nicer. Also, introducing 
getTokenAsInt in OUString would likely allow to reuse its toInt code.

Sounds a bit too special-purpose to be worth adding, IMO.  Would those 
optimizations really make a measurable difference?

I don't have real numbers to provide, but a very rough check on getToken 
provides the following numbers:

git grep -w getToken > getToken.txt
grep -wc getToken getToken.txt ==> 1646
grep -wc toInt32 getToken.txt ==> 218
grep -wc toInt64 getToken.txt ==> 8
grep -wc toUInt32 getToken.txt ==> 0
grep -wc toUInt64 getToken.txt ==> 8

The number of getToken occurrences is higher that real 
OUString::getToken calls (comments, header files, definitions and also 
not OUString getToken), and I am missing places in which conversion to 
integer is done in a following line. As a result we have that this 
pattern is > 14.2% of all getToken occurrences. I cannot say if this is 
frequently called code or not.

this is rather meaningless data, it could be that all of these calls are 
in UI code where performance is so irrelevant that it might as well be 
implemented in Python and the user couldn't tell the difference.

before you start micro-optimising things all over the place, please 
first get a callgrind profile of some actual usage scenario (file 
import/export maybe) where the function you want to optimise actually 
shows up on the profile; then you can be confident that you're actually 
making an improvement.
_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice