Richard:
Sorry to get back to this so late, but I had some other pressing matters.
Thanks to Tedd for answering the question I asked, I think, even
though I was asking the wrong question. :-)
No problem, but you did ask the right question. You touched on
something I think you intuitively knew, but have been sidetracked by
an easy "solution".
At 9:40 AM -0500 5/3/07, Richard Lynch wrote:
But as I realized last night, the data is ALREADY in that "curve" and
by simply breaking down in even increments from MIN to MAX, the
"curve" works itself out correctly.
Sort of.
If you are content with dividing the top 100 things into strict
groups of 20 for a tag cloud distribution, then fine. However, the
"20 items per group" rule is not defined in terms of the group's
distribution, which would be a better representation of the data.
Keep in mind you are trying to show which items are the most popular
in a representative way.
It's difficult to explain, so I'll show you:
http://sperling.com/a/stdev/
Each group (color -- could be tags) falls within a division based
upon the standard deviation (SD) of the population. The cyan group is
within one SD of the most popular -- the yellow group is within two
SD of the most popular and so on.
All members of each color group have more in common with each other
than with those outside their color group. If you will note, the
numbers of each color group change due to distribution of the
population. Using a strict "20 items per group" rule does not reflect
that. So, if you arbitrarily assign members of the population to a
group based solely on a strict division, then you are not accurately
representing the tag cloud.
Do you see what I mean?
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php