Re: Organisational question: surely someone has implemented many Boolean values (tags) and a solution exist

Richard Quadling <rquadling@xxxxxxxxx> · Fri, 21 Jan 2011 10:29:35 +0000

On 20 January 2011 19:20, Dotan Cohen <dotancohen@xxxxxxxxx> wrote:
> On Thu, Jan 20, 2011 at 19:21, Richard Quadling <rquadling@xxxxxxxxx> wrote:
>>> That is terrific, at least the first half. The second half, with the
>>> Venn diagrams, is awkward!
>>
>> When you get heavily nested data, the adjacent list model (where you
>> have a parentid for every uniqueid), you very quickly get into
>> complicated logic trying to traverse n-levels. The nested set model is
>> specifically built to handle this issue. I'd recommend getting to
>> grips with it. It will make finding items belonging to a group (or a
>> super group) a LOT easier.
>>
>> Especially if you have multiple tag hierarchies.
>>
>
> Is that strategy widely deployed, then? It seems so unruly having to
> change on average half the database records for every new leaf.
>
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com
>

Changing data in a database is the role of the database engine. It is
much more efficient to have the cost on the insert than it is on the
select.

The adjacent list model is very expensive at n-levels for the select,
but trivial cost for the insert. If you are inserting millions of rows
but only occasionally looking at the data, then stick with the
adjacent list model. But if tags and n-levels are regularly accessed
and form a main part to the functionality of the app, then you may
want to reconsider.

Sure, the insert for the nested set model is more expensive in terms
of the number of rows to amend, but indexing will certainly should
certainly help. If you have tools to help optimize the tag table and
the queries you use, then I'd follow the recommendations (I use MS
SQL, so my Query Optimization tools help me here). The nested set
model is extremely efficient on the select.

It is a trade off that you have to decide upon, based upon your data
and needs. If, as I suspect, you are going to be doing a LOT of
selects on the tags and (in the future) to multiple levels, then this
aspect needs to be very efficient.

For me it is well worth the effort of moving from the adjacent list
model to the nested set model.

Both mechanisms work. In my opinion, the adjacent list model is for
truly simply lookups, not for complicated n-levels.

One of the changes I made to the nested set model was for a Bill Of
Materials module. The client made complex machinery (industrial
lathes). The sum quantity for all the parts were in the 20,000 region.
Each nut, bolt, screw, etc. Many sub-assemblies. The parent/child node
logic was massive in dealing with retrieving questions like "How many
machines can we build?", "What stock do we need to buy/make to
complete an order of 20 lathes?". Lot's of recursion into each level
to build the list. Getting the results would take 3 or 4 minutes (this
is in a non SQL environment using a peer-to-peer modified D-ISAM
database - it was already slow because of all that). When I moved to
the nested set model, no recursion and 1 query (more or less) and I
have all the results I needed. It was seconds in comparison.

Regards,

Richard.
-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php