Moiz Kothari wrote:
Hi All,
What are the reasons of data getting fragmented in postgres? Do we
have any page which explains different scenarios of data getting
fragmented?
Regards,
Moiz Kothari
I guess there are two types of fragmentation in play with postgres,
internal database fragmentation and external filesystem fragmentation.
Internal fragmentation is caused by holes in disk blocks when records
are deleted, or a record is *just* a little too big that it can't fit in
a remaining empty space in a particular disk block. This can be fixed
(or at least reduced quite a bit) by using VACUUM/VACUUM ANALYZE
(especially using the FULL option) in psql or the command vacuumdb (and
its myriad options), as is a very helpful, oft-suggested strategy by
people on these lists, for reasons other than just defragmentation.
External fragmentation occurs in postgres for the same reason it occurs
in other types of files. Lots of appends and generally continual file
growth mean that the filesystem is less likely to find adjacent disk
blocks (which is the source of fragmentation). Even though Unix file
systems generally do a good job of preventing fragmentation, some
workloads are just not good. For instance, large mail spools (where the
policy is one file per spool instead of one file per message) are prone
to fragmentation.
Case in point, I use xfs as the filesystem running under postgres, and
after a few days the "major" database clusters showed ~90% fragmentation
on their respective partitions (which is about a 10 to 1 ratio of file
fragments to files). After running a defragmenter, the fragmentation
went under 5% (which is under 1.2 fragments per file), and performance
generally doubled. Other, more application specific databases which
still get hit and appended quite often, are often at no more than 3%-5%
fragmentation.
I poked around and sadly couldn't find any references to fragmentation
in the official docs, but I found a couple references to the psql
command CLUSTER in other archived lists. This may help fix
fragmentation, though I haven't tried it out myself to know if it works
(or what implications there are for performance in general).
I've also read accounts of people with *HUGE* databases with almost no
fragmentation, so yet another possibility in the fight against
fragmentation may be to tweak your filesystem or postgres configs. If
anyone has any specific parameters to try or experience doing this, I
would consider it more than welcome, because I do not have that knowledge.
Sorry for the relative verbosity.
Peter