Re: Tag Cloud revisited

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 12, 2007 11:46 am, tedd wrote:
> Sorry to get back to this so late, but I had some other pressing
> matters.

No worries.

It's not like I'm likely to disappear from here anytime soon :-)

>>Thanks to Tedd for answering the question I asked, I think, even
>>though I was asking the wrong question. :-)
>
> No problem, but you did ask the right question. You touched on
> something I think you intuitively knew, but have been sidetracked by
> an easy "solution".
>
> At 9:40 AM -0500 5/3/07, Richard Lynch wrote:
>>But as I realized last night, the data is ALREADY in that "curve" and
>>by simply breaking down in even increments from MIN to MAX, the
>>"curve" works itself out correctly.
>
> Sort of.
>
> If you are content with dividing the top 100 things into strict
> groups of 20 for a tag cloud distribution, then fine. However, the
> "20 items per group" rule is not defined in terms of the group's
> distribution, which would be a better representation of the data.
> Keep in mind you are trying to show which items are the most popular
> in a representative way.

I'm not dividing them into groups of 20.

I'm taking the min/max of the top 100, and dividing the SCALE into 5
equal chunks.

The scores themselves are weighted already, with only one or two in
the top 1/5th, a handful in the 2nd 1/5th, a goodly number in the 3rd
1/5th, a lot in the 4th 1/5th, and a buttload are down in that last
1/5th.

In other words, I took the Top 100, and graphed them on normal
cartesian graph paper -- What I was originally trying to do was graph
them on logarithmic paper.

> It's difficult to explain, so I'll show you:
>
> http://sperling.com/a/stdev/
>
> Each group (color -- could be tags) falls within a division based
> upon the standard deviation (SD) of the population. The cyan group is
> within one SD of the most popular -- the yellow group is within two
> SD of the most popular and so on.
>
> All members of each color group have more in common with each other
> than with those outside their color group. If you will note, the
> numbers of each color group change due to distribution of the
> population. Using a strict "20 items per group" rule does not reflect
> that. So, if you arbitrarily assign members of the population to a
> group based solely on a strict division, then you are not accurately
> representing the tag cloud.

If I took the first 20, second 20, etc, yeah, that would be way wrong
as well.

I didn't do that.

I just scaled my "grid" upon which to graph them in cartesian space at
an offset of MIN(top100) - MAX(top100) and then let the chips fall
where they may on graphing.

> Do you see what I mean?

Yes -- I think we ended up with pretty much the same result...

Well, not the *same*, but very similar shaped curves anyway.

But my way was "easier" as I just let the natural distribution of the
data on standard graph paper take care of distributing the points
where they belonged.

I suppose there is some merit to forcing the Standard Distribution
instead of living with whatever the "real" data is.

But I'm more happy living with the Reality of the data than applying a
Standard Distribution to data which, according to some experts, isn't
even a Standard Distribution at all, but a "long tail" or some other
terms they bandy about that mean the same thing as "long tail" as I
understand it...

I don't claim my way is "right" -- just that it "works" and is dead
easy and is data-driven rather than conforming to some statistical
model which may or may not be the correct model in the first place.

If somebody NEEDS a Standard Distribution, for sure use Tedd's stuff,
cuz that is what that is.

If you're just trying to "graph" the data that have, whatever it may
be, just graph it, scaled and offset, and see what kind of curve you
have.

PS
I'll post the actual tag cloud page link once it's out of QA and not
hidden from search engines behind HTTP Basic Auth.  RSN, but
definiitely not until after php|tek:
http://phparch.com/tek

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux