Tag Cloud revisited

tedd <tedd@xxxxxxxxxxxx> · Sat, 12 May 2007 12:46:09 -0400

Richard:

Sorry to get back to this so late, but I had some other pressing matters.

Thanks to Tedd for answering the question I asked, I think, even
though I was asking the wrong question. :-)

No problem, but you did ask the right question. You touched on 
something I think you intuitively knew, but have been sidetracked by 
an easy "solution".

At 9:40 AM -0500 5/3/07, Richard Lynch wrote:
But as I realized last night, the data is ALREADY in that "curve" and
by simply breaking down in even increments from MIN to MAX, the
"curve" works itself out correctly.

Sort of.

If you are content with dividing the top 100 things into strict 
groups of 20 for a tag cloud distribution, then fine. However, the 
"20 items per group" rule is not defined in terms of the group's 
distribution, which would be a better representation of the data. 
Keep in mind you are trying to show which items are the most popular 
in a representative way.

It's difficult to explain, so I'll show you:

http://sperling.com/a/stdev/

Each group (color -- could be tags) falls within a division based 
upon the standard deviation (SD) of the population. The cyan group is 
within one SD of the most popular -- the yellow group is within two 
SD of the most popular and so on.

All members of each color group have more in common with each other 
than with those outside their color group. If you will note, the 
numbers of each color group change due to distribution of the 
population. Using a strict "20 items per group" rule does not reflect 
that. So, if you arbitrarily assign members of the population to a 
group based solely on a strict division, then you are not accurately 
representing the tag cloud.

Do you see what I mean?

Cheers,

tedd

--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php