Re: Question on Fragmentations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Moiz Kothari wrote:
Hi All,

What are the reasons of data getting fragmented in postgres? Do we have any page which explains different scenarios of data getting fragmented?

Regards,
Moiz Kothari
I guess there are two types of fragmentation in play with postgres, internal database fragmentation and external filesystem fragmentation.

Internal fragmentation is caused by holes in disk blocks when records are deleted, or a record is *just* a little too big that it can't fit in a remaining empty space in a particular disk block. This can be fixed (or at least reduced quite a bit) by using VACUUM/VACUUM ANALYZE (especially using the FULL option) in psql or the command vacuumdb (and its myriad options), as is a very helpful, oft-suggested strategy by people on these lists, for reasons other than just defragmentation.

External fragmentation occurs in postgres for the same reason it occurs in other types of files. Lots of appends and generally continual file growth mean that the filesystem is less likely to find adjacent disk blocks (which is the source of fragmentation). Even though Unix file systems generally do a good job of preventing fragmentation, some workloads are just not good. For instance, large mail spools (where the policy is one file per spool instead of one file per message) are prone to fragmentation.

Case in point, I use xfs as the filesystem running under postgres, and after a few days the "major" database clusters showed ~90% fragmentation on their respective partitions (which is about a 10 to 1 ratio of file fragments to files). After running a defragmenter, the fragmentation went under 5% (which is under 1.2 fragments per file), and performance generally doubled. Other, more application specific databases which still get hit and appended quite often, are often at no more than 3%-5% fragmentation.

I poked around and sadly couldn't find any references to fragmentation in the official docs, but I found a couple references to the psql command CLUSTER in other archived lists. This may help fix fragmentation, though I haven't tried it out myself to know if it works (or what implications there are for performance in general).

I've also read accounts of people with *HUGE* databases with almost no fragmentation, so yet another possibility in the fight against fragmentation may be to tweak your filesystem or postgres configs. If anyone has any specific parameters to try or experience doing this, I would consider it more than welcome, because I do not have that knowledge.

Sorry for the relative verbosity.

Peter


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux