Postgresql 8.4 and Data warehousing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Data warehousing using postgresql is it possible ?

Answer is yes, it is possible.

Postgresql has always been considered as not suitable for data
warehousing because of advanced database features present in it, like
transaction support , locking, etc.

But the fact is: We cannot decide performance of a database just by
reading about it, best way is to practically implement the scenario,
tune it and compare it with other databases.

I did just that and following are the points one should consider to
achieve data warehouse in postgresql.

    Handling transactions and locks
        -Optimum transaction size
        -Parallelism to avoid locks
    PostgreSQL parameter configuration
        -Tweaking postgresql configuration parameters to best utilize
available hardware.
    Compartmentalize data storage
        -Making logically physical division of data on disk
    Views and fact tables
        -Frequency of create and update view and maintaining fact tables
    Column Index planning
        -Deciding use of index.
    Query planning
        -Planning a query based on requested criteria
    Hardware/Software properties that effect performance
        -RAM, HDD properties and softwares that can increase /
decrease system performance

Case study: Open source SIEM Cyberoam-iview
-> http://sourceforge.net/projects/cyberoam-iview/

Cyberoam-iView is a practically successful example of A Real time data
warehouse using unchanged postgresql 8.4 and files.

-->Features:
-Supports up to 60,000 Events/Sec. ( collect, parse, process, insert
into postgresql tables)
-1040 predefined reports (OLAP Aggregated cubes).
-Storage capacity: Aggregated cubes with granularity of 5 minutes for
1 week, cubes with granularity of 4 hours for 1 year, cubes with
granularity of 24 hours for 7 years. Fact table in compressed form.
-Response time < 30 seconds for most of the reports.
-Real time view update, Aggregation cubes updated every 5 min.
-Reports for any time range selection (up to 7 years) supported.

-->Limitations:
-Does not support ad hoc aggregated report generation from fact table,
All reports needs to be predefined at the time of deployment.
-Fact table need to be extracted and loaded before searching.
-Each cube is updated with top n significant aggregated records where
0 < n < 4001.

-->Minimum Hardware requirements :
Pentium P4 processor 3.0 G Hz
2 GB RAM
1 TB HDD , standard company with read write speed > 50 MBps

Regards
Dhaval Rami

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux