Hello, (sorry for this long mail) I have started to evaluate ltree and tsearch (first on Windows with PG Version 8.3.7) and I would apprecicate some clarification. The first part deals with ltree where I have some questions, and the second part is a concept to combine ltree with the full text search capabilities. Here I'd like to get your opinion whether I'm on the right way or not. Moreover I will probably need my own parser for the full text search and I would be very gratefull if anybody could provide me with some C code example that is similar to my requirement. and kudos for ltree and the full text search. I'm impressed by their performances :-) I would like to use these two functionalities to store and analyze paths, (the model beyond the data is not a tree, but a dense directed graphs). A) ltree ======== A ltree path would look like this: ...2_456.7_3425.1_23.9_231.... whereas each node is a combination of 2 information: <class_id>_<item_id> If my idea works well, I may also like to put some more information in each node which would hence be a micropath of fix length (3 or 4) With such a model, I could offer global path analysis on the <class>, and drilldown possibilities on the <items>. I came to this idea as the documentation of ltree says that a node is a list of words separated by the '_' character. a short example: --drop table paths ; create table paths (id int, path ltree); insert into paths(id,path)values(1, '1_11.2_13.3_10.4_13'); insert into paths(id,path)values(2, '1_12.4_15.3_11.4_10.15_14.1_11'); insert into paths(id,path)values(3, '1_11.2_13.3_10.4_10'); insert into paths(id,path)values(4, '1_12.4_15.3_11.3_10.13_14.13_14'); insert into paths(id,path)values(5, '1_11.2_13.3_10.2_13'); insert into paths(id,path)values(6, '1_12.4_15.3_11.1_10.12_14.1_11'); insert into paths(id,path)values(7, '127_1235'); --now I can e.g retrieve all items that have a path from <class 2> to <class 4> using the prefix matching: select id from paths where path ~ '*.2_*.*.4_*.*'::lquery -->(1 & 3) --drop table paths ; And now my questions and comments: 1) >From the module description, I first though that the '_' character had a special meaning but it just seems to be an extra allowed character beside [a-zA-Z0-9]. Am I correct or is '_' defined internally as separator for indexing/searching ltree data ? 2) The documentation says that the length of a label path is limited to 65kB. This is the max number of nodes, and not the size of the string. Moreover, when you try to define a larger path, it will be silently cut without notice or error (seems that the first 65kB are just dropped, but I'm not sure about this). Maybe it would be a good thing to add this to the documentation. 3) Gist index create index path_gist_idx on paths using gist (path); => ERROR: index row requires 621840 bytes, maximum size is 8191 So it seems that gist indexes can only be used for paths that do not exceed 8kB in size, which is much less than the 65kB limit in depth. Is this correct or am I missing a point ? (I also have PostGIS 1.3.5 installed. Could this be an issue ?) B) Full text search =================== ltree offers a prefix search on the nodes, but no suffix search which I need to look for given <item_id>. So my idea is to combine ltree with full text search. If a node has a fix format like <class>_<item>, I could use full text search indexing to look for paths containing a given item. My first step was to check the default parser which works really badly in this case: SELECT alias, description, token FROM ts_debug('1_12.4_15.3_11.1_10'); alias description token ----- ----------- ----- uint Unsigned integer 1 blank Space symbols _ float Decimal notation 12.4 blank Space symbols _ float Decimal notation 15.3 blank Space symbols _ float Decimal notation 11.1 blank Space symbols _ uint Unsigned integer 10 So I guess I need my own parser which should return something like: (this would be a specific parser that had to know the exact pattern and meaning of a node) alias description token ----- ----------- ----- class class_id 1 blank Space symbols _ item item_id 12 blank Space symbols . class class_id 4 blank Space symbols _ item item_id 15 blank Space symbols . class class_id 3 As already said, I would be very gratefull if anybody could provide me with some C code example that is similar to my requirement. Having this, I would then define different search configuration: CREATE TEXT SEARCH CONFIGURATION ltclass; CREATE TEXT SEARCH CONFIGURATION ltitem; ALTER TEXT SEARCH CONFIGURATION ltclass DROP MAPPING FOR ltitem; ALTER TEXT SEARCH CONFIGURATION ltitem DROP MAPPING FOR ltclass; and build different search indexes for each kind of data contained in a node: CREATE INDEX path_class_ix ON paths USING gin(to_tsvector('ltclass', ltree2text(path))); CREATE INDEX path_item_ix ON paths USING gin(to_tsvector('ltitem', ltree2text(path))); Finally using both ltree and full text search would provides a very powerfull framework for path analysis where a node is not only a single reference, but a combination of some pieces of information :-) And a last question: the alias below exists only in the parser. I can't use it later on to restrict a search to token of a given alias. Right ? Thank you for your attention. Marc Mamin -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general