On Wed, Oct 26, 2011 at 3:23 AM, Tommy Pham <tommyhp2@xxxxxxxxx> wrote: > On Wed, Oct 26, 2011 at 1:40 AM, Lester Caine <lester@xxxxxxxxxxx> wrote: > >> Tommy Pham wrote: >> >>> I wonder ... The real question is what's the purpose of the DB? Is it >>> for OLAP >>> or OLTP? ;) >>> As for dealing with DB having millions of rows, you're crossing over into >>> DBA area. >>> >> >> Many of my customers have coming up on 20 years of data available. There >> has been a debate on transferring historic data to a separate database, but >> having it available is not causing a problem, except for some counts and >> larger search actions, and being able to see how long a client has been >> visiting is often useful. Statistical analysis is always done on a separate >> machine, with a replicated copy of the data, so as not to affect the active >> users ... >> >> > What kind of counts/filters? What kind of RAID subsystem is the storage? > What's the total size of the DB? Up to 20 years of data should be in the > peta range. In that peta range, if you're not having performance issue and > not using either RAID 0, 0+1, 10, 50, or 60, I'd love to hear about the > application and database design in details. :) > Brain fart... that should be tera not peta ... lol.. > > >> That said, we are now using the much more detailed LLPG address data >> rather than simple postcode, and that has added another order of magnitude >> to data that is being searched live ... >> >> >> -- >> Lester Caine - G8HFL >> ----------------------------- >> Contact - http://lsces.co.uk/wiki/?page=**contact<http://lsces.co.uk/wiki/?page=contact> >> L.S.Caine Electronic Services - http://lsces.co.uk >> EnquirySolve - http://enquirysolve.com/ >> Model Engineers Digital Workshop - http://medw.co.uk// >> Firebird - http://www.firebirdsql.org/**index.php<http://www.firebirdsql.org/index.php> >> >> > Since Jason didn't disclose sufficient information, I suggested that RAID > storage based on the worst case scenario being this: as a business analyst, > I'd want to do drill-downs, filters, counts of a the following for an > up-coming marketing campaign: > > * county(ies) > * city(ies) > * zip code(s) > * address type (business and/or residential) > * of business (commercial, industrial/manufacturing, etc) > * of residential ( homes - single/townhouses - or apartments/condos ) > > The filters and counts will any combinations of the above, ie: 5 random zip > codes within the given state that lists all other criteria and break down. > As Jason mention having 89million rows for a given state, how long would it > take to run the drill-downs if the DB isn't sitting on a fast storage > medium? That 89 million is the most likely the average count in the USA. > For California and New York, the number can double that easily. That's only > the basic filtering. What of a specific business industry such as > landscaping? What of the filtering by yearly income and/or real estate > value? BTW, as a business analyst, I don't want to wait hours for the info > to update every time I change a criteria/filter to get the counts before I > look into a few random individual records from the results. > > As I've mentioned, something of this magnitude is better to leave it to the > DBA and work together with that DBA. Either hire/contract one or become one > :) > > > Regards, > Tommy >