Re: blobs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 31 Jan 2007 18:45:15 -0500
"Chad Wagner" <chad.wagner@xxxxxxxxx> wrote:

> On 1/31/07, Steve Holdoway <steve.holdoway@xxxxxxxxxxxxx> wrote:
> >
> > The table I'm concerned with at the moment have (currently) 5 million
> > rows, with a churn of about 300,000 rows a week. The table has about a
> > million hits a day, which makes it the main potential bottleneck in this
> > database.
> 
> 
> Why would it be a "bottleneck"?  If you are updating or deleting 300K a
> week, definitely make sure you take a look at autovacuum and turn it ON.
It is a potential bottleneck as I said. I am replacing between 5 and 10 percent of the data in the table every week. There are, in addition, a daily total of about a million hits ( read/update/delete ), as I said. 

This is the busiest table in the database. Therefore, it's the most likely candidate to cause performance problems. That's what I call a bottleneck.
> 
> 
> We need to store some large ( 0 -> 100kB ) data with each row. Would you
> > recommend adding it as columns in this table, given that blobs will be
> > stored in the pg_largeobject table anyway, or would you recommend a daughter
> > table for this?
> 
> 
> Depends on how you are querying the table.  This is really a database
> modeling question, and leads into many many more questions.  I would say if
> your frequently range scanning the table (selecting several rows) and in
> those cases you rarely need the "blob", then I would fork it off into a
> child table.  If the "blob" is rarely accessed, and only accessed directly,
> then definitely a child table in my book.
> 
> The reason is if your frequently fetching rows from this table and rarely
> using the blob all you are doing is consuming memory that could be better
> used for other things, and spinning I/O when it is not necessary.
Sorry, you're completley wrong. If you consider that the only way of getting info is select *, then this is true. Personally, I think that anyone who does this in code is plain lazy and should find a job more suited to them (:

I am asking for input from those who have been in this situation before, and have experience in the tradeoff of running a separate table for the big stuff as against the extra ( 8 byte? ) column that would be added to the master table. Why am I asking this? Because Postgres has an unique way of handling this kind of data, unique from even the last time I used postgres in anger - 7.4. It's different from every other rdbms ( and MySQL ), and I have no practical experience of it in the wild.
> 
> 
> Any other suggestions on how to avoid performance problems with this table (
> > hardware is dual Xeon, 4GB mem, 2 hardware raid channels for storage + 1 for
> > logs, all running debian 32 bit ).
> >
> 
> It really has to do with how you design your schema.
No. It really has to do with the gathering of relevant information so that I can design my schema properly. Academically, 3NF may be the solution to every design, but in the real world, there are many other things to take in to account. I've monitored the data flows, sized the hardware to handle the IO, and can either spend a week or two benchmarking different solutions from cold, or I can take the advice of those with relevant experience ( who I expect to find on this list ) to point me in the right direction first.
> 
> 
> -- 
> Chad
> http://www.postgresqlforums.com/
> 
Steve.

PS. Please fix your headers so replies to your mails appear on the list directly.


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux