Search Postgresql Archives

Re: For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 19, 2007, at 9:29 AM, Richard Broersma Jr wrote:

--- Phoenix Kiula <phoenix.kiula@xxxxxxxxx> wrote:

Then I am confused again about how the fill factor stuff works. Let's
say I have a table with four BTREE indexes. Should all of them have a
fill factor of about 60 (lower than the default 90, that is) to be
effective? Or will it help if I lower the fill factor on only a couple
of the most often used ones? The primary key index is very, very
rarely updated so I don't need it to have a fill factor.

I could try and see these one by one, but that's no better than
touching/feeling the database blind-folded. I would love to see some
writeup about this whole thing, but it seems hard to come by!

I will try to explain everything that I understand about indexes and tables. I am sure that if some part of my understanding is incorrect, someone will chime in to correct me.

In PostgreSQL, tables are physically distinct from indexes. This means that any give table is written to disk as a file(s), and indexes are also written to disk as a separate file(s). A table and index are both saved to disk in segmented block referred to a pages (I believe the default
size is 8K).

Yes, that can be changed at compile time, although I don't think I've ever heard of any advantages to doing that.


The advantage of the index file is that it is significantly smaller in size, so it takes less time
to sequentially scan and less time to read from disk.

Now when you want to find a record, PostgreSQL will/(may choose to) sequentially scan the index until it find the record entry that corresponds with your criteria. This entry has a table cross-reference to the actual page that contains the record that is "pointed" at by the index. Lastly, the entire table page containing your record is read from disk to memory for further query
processing.

<ASSUMPTION>
When you insert a record into a table that generates an entry into the b-tree index file, PostgreSQL will scan the pages of the index file to find the correct place and index page to add this entry. If the page is already full, PostgreSQL "probably" replaces the old full pages with two new pages with a distribution of that chunk of the B-tree index, and then adds the new entry to one of those pages. This operation can become very expensive if many new pages need to be
created from single INSERT/UPDATE statement.
</ASSUMPTION>

Yes, the point of B-trees is that they have fast lookup times, but updates can be expensive when you have to re-balance your leaf nodes.

By using fill factor, you are telling PostgreSQL to automatically leave a portion of any newly created index page partially blank for future use. When a newly created index entry needs to be
saved, it can be stored in one of the "holes" left in the index page.

That future use is only for updates.


A large fill factor create both advantages and dis-advantages. For writing operations, it is a big advantage because, a large fill factor will leave alot of holes for INSERT and UPDATE operations to use. This can help increase the number of UPDATE/ INSERT per seconds that you server can handle since, they index pages do not have to be rebuilt very often.

Not quite. Once a page has reached it's fill factor percentage full, no more inserts will happen on that page, only updates. Also, I think you have large/small backwards wrt fill factor. If you have a fill factor of, say, 40% then once a page has reached 40% full no more inserts will happen (unless some space is reclaimed by vacuum). So, smaller fill factors == bigger holes. The bigger the fill factor, the smaller the whole: if you have a fill factor of 90%, only 10% is reserved for updates of rows on that page.


However, the disadvantage is that, a newly created index with a large fill factor has "index bloat" designed into it. This mean that the index pages have a large portion of holes. So PostgreSQL will create more index pages than it would normally in order to hold both your index and the pre-defined holes that you specified by your fill-factor. Larger indexes require more time to be read from disk to memory and will require more time to sequentially scan to find to find the cross-reference table page location of records of interest. So the net effect is that
larger indexes will make SELECT statement slower.

This is my understanding for tables indexes and fill factor. I hope it helps.

Again, with the large v. small fill factor point. Using fill factor seems to be a trade-off between space and update efficiency. Let's say that after so many (potential) updates you know that each row will become static, i.e. no more updates will happen. Let's use some numbers and make them easy to work with. Say each row will be updated exactly once and you use a fill factor of 50%. Now, say 5K of fresh index data is written. The first 4K will go into one page at which point it has hit it's 50% fill factor threshold and the other 1K has to go into another page. Now, let's say each tuple in the index is updated, the first page is now at 100% full as all of the updates for the first page filled up the other 50%. The page is now full until you vacuum. However, once you vacuum and the first 4K that was originally inserted is freed, that 4K will never get used again as the page is back down to 50% so no more inserts can happen on that page, and we know that each tuple would only be updated that once and you're left with 50% "bloat". So, you can see that for tables/indexes that aren't going to see a lot of updates to the same tuples something as low as 50% is probably a pretty aggressive fill factor. In fact, I would say that you shouldn't really even bother playing around with the fill factor unless you know that the table/ index sees a lot of updates. For my example where each tuple sees only one update, the index default fill factor of 90% is probably fine.

Erik Jones

Software Developer | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux