Re: Possible extensions to OUString class

Matteo Casalin <matteo.casalin@xxxxxxxxxxxxxx> · Thu, 31 Jan 2019 08:04:49 +0100

Hi Stephan,

On 1/30/19 10:40 PM, Stephan Bergmann wrote:
On 30/01/2019 22:17, Matteo Casalin wrote:
     I'm working on improving code that calls getToken (e.g. using its 
version with index, or using other OUString functions in its place 
when possible).
One thing that I noticed is that there are a lot of calls in the form 
getToken().toInt# which require memory management just to obtain a 
value that could be generated by the original OUString. Similarly (but 
less frequently), some tokens are extracted just to compare them 
against a string, which again requires memory management that is 
really not needed.

I was wondering if extending O(U)String with functions like:

* getTokenAs[U]Int#(token, sep, index)
* matchToken(token, sep, index, string)

would be accepted/appreciated or not. At the moment I already 
submitted to gerrit a patch [1] which adds 
comphelper::string::matchToken but I think that adding such 
functionality to OUString directly would be nicer. Also, introducing 
getTokenAsInt in OUString would likely allow to reuse its toInt code.

Sounds a bit too special-purpose to be worth adding, IMO.  Would those 
optimizations really make a measurable difference?

I don't have real numbers to provide, but a very rough check on getToken 
provides the following numbers:

git grep -w getToken > getToken.txt
grep -wc getToken getToken.txt ==> 1646
grep -wc toInt32 getToken.txt ==> 218
grep -wc toInt64 getToken.txt ==> 8
grep -wc toUInt32 getToken.txt ==> 0
grep -wc toUInt64 getToken.txt ==> 8

The number of getToken occurrences is higher that real 
OUString::getToken calls (comments, header files, definitions and also 
not OUString getToken), and I am missing places in which conversion to 
integer is done in a following line. As a result we have that this 
pattern is > 14.2% of all getToken occurrences. I cannot say if this is 
frequently called code or not.

About matchToken, this seems to be a very less frequent pattern and at 
the moment the comphelper approach can provide a viable approach, so I 
woulg go this way (and will take care of reviewing some older getToken 
optimizations that I implemented).

Also, a better approach overall would probably be some string_view-based 
getToken functionality (converting from an OUString to a string_view is 
cheap), and then string_view-based toInt etc. functions.

At the moment I plan to just go through all of getToken uses and do some 
minor local optimizations, then I might have a look at the string_view 
approach (unless previous numbers make the OUString one look not too 
specialised).

Many thanks for your comments
Kind regards
Matteo

_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice
_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice