On Mar 4, 2012, at 22:31, Chris Angelico <rosuav@xxxxxxxxx> wrote: > (Hoping you meant for that reply to be on-list as I'm here responding on-list.) > > On Mon, Mar 5, 2012 at 2:16 PM, A.M. <agentm@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >> On Mar 4, 2012, at 9:13 PM, Chris Angelico wrote: >> >>> One of our tables has a few columns that may be interpreted as strings >>> or may be numbers (data type is varchar, numbers are stored as >>> decimal). Generally, operations are performed on the string, but >>> sometimes we need to parse out a number - without it failing on error. >>> I wrote the following function to approximate to the semantics of >>> atoi: >> >> I would recommend against such a schema since different data types should warrant their own columns, but if you are left with no choice... > > The values have to be strings for other reasons (eg '' is valid > everywhere, and this is subsequently processed by a script that > expects all strings). So yeah, no choice there. But I agree that > normally you DO want integers stored in integer columns, and we're > paying a performance penalty for this. > >>> >>> create or replace function str2int(val varchar) returns bigint immutable as $$ >>> begin >>> val=substring(val from '[0-9]*'); >>> if length(val) between 1 and 19 then return val::bigint; end if; >>> return 0; >>> end; >>> $$ language plpgsql; >> >> This can be written as: >> select substring('35345345.45645654' from '\d{1,19}')::bigint; >> >> Be aware that this does not account for: >> 3dogs >> 3 dogs >> 3,dogs >> 3.5.6.7 >> >> all of which will return 3::bigint, but I assume that the column is not completely free-form or maybe this is what you want. (?) >> >> Cheers, >> M > > That is in fact the behaviour I want. Trouble is, that simpler version > returns NULL if given 'asdf' as its input - I want it to return 0. COALESCE is your friend > It's also about the same performance (slightly slower in a quick test) > than the original; it's still doing a regular expression parse. I was > hoping very much to avoid the regex altogether. > > ChrisA > Any efficient, non-RegEx, alternative would require more context to evaluate than you provide. Mainly, would it be faster to have a separate field to store the parsed (at input) number and then query that field directly (even if it is a text field as well)? Basically cache the parse. David J.. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general