Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2018-07-18 14:34:34 -0400, Robert Haas wrote:
> On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund <andres@xxxxxxxxxxx> wrote:
> > FWIW, here's a rebased version of this patch. Could probably be polished
> > further. One might argue that we should do a bit more wide ranging
> > changes, to convert scanint8 and pg_atoi to be also unified. But it
> > might also just be worthwhile to apply without those, given the
> > performance benefit.
> 
> Wouldn't hurt to do that one too, but might be OK to just do this
> much.  Questions:
> 
> 1. Why the error message changes?  If there's a good reason, it should
> be done as a separate commit, or at least well-documented in the
> commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s.  But you're right, it'd be worthwhile
to point that out in the commit message.


> 2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.


> 3. If this is a drop-in replacement for pg_atoi, why not just recode
> pg_atoi this way -- or have it call this -- and leave the callers
> unchanged?

Because pg_atoi supports a variable 'terminator'. Supporting that would
create a bit slower code, without being particularly useful.  I think
there's only a single in-core caller left after the patch
(int2vectorin). There's a fair argument that that should just be
open-coded to handle the weird space parsing, but given there's probably
external pg_atoi() callers, I'm not sure it's worth doing so?

I don't think it's a good idea to continue to have pg_atoi as a wrapper
- it takes a size argument, which makes efficient code hard.


> 4. Are we sure this is faster on all platforms, or could it work out
> the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

Greetings,

Andres Freund




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux