Re: Organisational question: surely someone has implemented many Boolean values (tags) and a solution exist

Dotan Cohen <dotancohen@xxxxxxxxx> · Thu, 20 Jan 2011 22:21:29 +0200

On Thu, Jan 20, 2011 at 22:05, David Harkness <david.h@xxxxxxxxxxxxxxxxx> wrote:
> Thanks for the link. That article proposes an interesting way to organize
> the categories. Have you implemented this in the wild? Clearly the design
> would work as it's pretty simple, and I like that it removes the need for
> recursive queries.

I am also interested in knowing if this approach is used in any production code.

> Dotan, the Venn diagrams are just used to explain the concept. If you use
> the code to determine the left and right values, you can ignore the diagrams
> entirely. As long as you're not adding/removing categories every minute,
> having to recalculate left and right values isn't that big of a deal.

I understood that. My concern is exactly with adding new nodes. There
is no incrementor (++i) in SQL, so knowingly coding a solution that
will require incrementing two fields in half the database rows seems
irresponsible.

> Also, there's no reason you couldn't keep the parent_id field with the
> nested sets. It would come in handy for certain types of queries, though
> it's not necessary.

That is true. I could store both methods, and experiment to see which
is preferable. But what a mess this would be if the two methods go out
of sync! Isn't there a name for that in SQL, something along the lines
of not storing the same data in two places lest one should change and
not the other? The term escapes me.

>> I disagree. The method I proposed can be extended to any depth, and any
>> leaf or branch can be retrieved with a single query.
>
> The nested set method can be extended to any depth, and it pays off more the
> larger the hierarchy grows. While you can retrieve any branch (all
> ancestors) of a node with a single SQL query, the SQL engine itself actually
> must perform a recursive query meaning multiple hits on the parent_id index.

That pays off more? For the guy writing code or for the database
memory requirement?

>> I suppose for retrievals this structure has advantages, but unless
>> MySQL has a ++ operator (or better yet, one that adds or subtracts 2
>> from an int) then it looks to be a pain to add nodes.
>
> ++ or += wouldn't be any better here than x = x + 2. Once you're modifying
> indexed values, you'll pay a much higher price writing to disk than += could
> ever save you in CPU cycles. The beauty is that inserting a node requires
> only two update statements that will fix *all* categories that need to be
> adjusted.

Only two update statements, but they are affecting on average half the
database's rows!

> Adding categories to the hierarchical model is definitely faster
> so it comes down to your insert-to-select ratio. Moving a subtree is also
> much easier with the hierarchical model.

Which do you call the hierarchical model? That term is not used in the
linked article.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php