Search Postgresql Archives

Re: split string by special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jan-Erik wrote:
I wonder if you could please help me out to extract a character string
to an array or better yet, a table.

I'd like to split strings of text up into words and delimiters (but
not delete the delimiters). The delimiters are defined as comma,
space, dot, singe/double quotation mark, question mark etc.¹ in a
separate table (delimiters) depending on what rules apply for the
input.

regexp_split_to_array/table seem quite suitable but I have
difficulties to form the right expression with it, apart from that it
remove the delimiters as well.

Example:
This is just a text that contain special characters such as , (comma),
"(", ")" (left and right parenthesis) as well as "?" question mark.
How do I split it up with PostgreSQL?

Expected result:
{This, " ", is, " ", just, " ", a, ..., PostgreSQL, "?" }
__________________
¹)  Also later on tags such as <html> and at other times something
else depending on the circumstances.

//Jan-Erik

Hi,

I was thinking about that and in my opinion the approach to let the database do that is the wrong direction. Sure you can do a lot with regexp_split_to_table or regexp_split_to_array but they are kind of limited compared to a programming language using regular expressions. If I had to try to get your jobdone, I would try regexp_matches() like:

SELECT regexp_matches('This is just a text, that contain special characters such as, (comma),"(", ")" (left and right parenthesis) as well as "?" question, mark.How do I split it up with PostgreSQL?', E'(\\w*.)\\s+','g');

regexp_matches
----------------
 {This}
 {is}
 {just}
 {a}
 {"text,"}
 {that}
 {contain}
 {special}
 {characters}
 {such}
 {"as,"}
 {","}
 {"\""}
 {left}
 {and}
 {right}
 {parenthesis)}
 {as}
 {well}
 {as}
 {"\""}
 {"question,"}
 {How}
 {do}
 {I}
 {split}
 {it}
 {up}
 {with}
(29 rows)

So, you have the ability to catch the seperators like ','. But for now, teh example just catches the comma. But you want to catch a lot of other seperators as well. I suggest you do that within the logic of your coding language because I don't think this will be an easy way to walk ;-). This is no database job in my opinion.

Cheers

Andy




--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux