On Thu, Feb 4, 2010 at 7:16 AM, Matt Domsch <matt@xxxxxxxxxx> wrote: > One thing that's painfully obvious is that the "Unique IP addresses" > method of counting the number of installations [3] is woefully > under-counting the actual number of installs. How is it obvious? How do you know that a significant chunk IP addresses showing up are roaming systems? This computer I'm on right now has check for updates from no less than 10 different networks this month. I'm going to counter all of this by saying for the purpose of global or regional map making... does getting more accurate numbers matter or do you expect the undercounting factor to have a regional bias that is skewing the relative client densities for one region compared to another on the global map? Exact numbers are nice...but do you need them? We aren't going to get an exact number for userbase ever. I'd be more interested in standing up a correction factor with an error bar that can be used in a statically significant way to get from the numbers we do have to an estimate of active userbase. My first cut at doing that involved looking at the rate of growth of smolt UUIDs to the rate of growth of Unique IPs over a 16 month period. I wouldn't call what I saw a huge undercount in unique ips. My method pegs the correction factor at about 1.15 with a stdev of 0.03... or to say that in English that we are under-counting by about 15% globally. I never found the time to go back and check to see if that factor varied significantly region by region. So you've done the frequency analysis. Have you gone further and assuming an update request cadence for a given client what the weighted adjustment of the long tail looks like in aggregate? Assuming every client is the same and checks in X number of times per day... in the average what is the number of such clients per ip address? You should be able to determine that number from by integrating over the histogram of the number of ip addresses binned by connections per day, dividing by the number of ip addresses seen that day and dividing by whatever X you choose. You'd have to convince me with some math and some plots that the long tail is really dragging things off by a large factor. -jef -jef _______________________________________________ advisory-board mailing list advisory-board@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/advisory-board