Search Postgresql Archives

Re: Counting the occurences of a substring within a very large text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marc Mamin <M.Mamin@xxxxxxxxxxxx> writes:
> I'd like to count the number  linebreaks within a string, 
> but I get a memory allocation error when using regexp_matches or regexp_split_to_table.

Anything involving a regexp is going to have a maximum input string length
of about 256MB, as a result of conversion to pg_wchar format.
regexp_split_to_table(), for instance, does this:

    /* convert string to pg_wchar form for matching */
    orig_len = VARSIZE_ANY_EXHDR(orig_str);
    wide_str = (pg_wchar *) palloc(sizeof(pg_wchar) * (orig_len + 1));
    wide_len = pg_mb2wchar_with_len(VARDATA_ANY(orig_str), wide_str, orig_len);

palloc() would complain for requests beyond 1GB, and sizeof(pg_wchar) is
4, so 256MB is the longest orig_str this can handle.  (This coding is on
the hairy edge of being a security bug on 32-bit machines, too, but AFAICS
it is okay because there's a factor of 4 daylight between the max possible
input length of 1GB and overflowing uint32.)

In newer branches we could consider using MemoryContextAllocHuge to escape
the 1GB limit on wide_str, but it would take some research to be sure that
everything involved is using suitable datatypes for string indices and so
forth.  I'd be a bit worried about the performance implications of
throwing such large strings around, anyway.

			regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux