Hi Colin, Il giorno 01/feb/08, alle ore 15:22, Colin Wetherbee ha scritto:
I'm not sure about the internals of PostgreSQL (eg. the Datum object(?) you mention), but if you're just scaling vectors, consecutive memory addresses shouldn't be absolutely necessary. Add and multiply operations within a linked list (which is how I'm naively assuming Datum storage for arrays in memory is implemented) will be "roughly" just as fast.
I'm not an expert, anyway the SSE instructions family should make the difference when performing this kind of workload, and those instructions work on consecutive memory cells.
How many scaling operations are you planning to execute per second, and how many elements do you scale per operation?
typically, arrays contain 1000 elements, and an operation is either multiply it by a scalar or multiply it element-by-element with another array. The time to rescale 1000 arrays, multiply it for another array and at the end sum all the 1000 resulting arrays should be enough to be carried on in an interactive application (let's say 0.5s). This, in the case when no disk-access is required. Disk access will obviously downgrade performances a bit ad the beginning, but the workload is mostly read-only so after a while the whole table will be cached anyway. The table containing the arrays would be truncated/repopulated every day and the number of arrays is expected to be more or less 150000 (at least this is what we have now). Nowadays, we have a c++ middleware between the calculations and an aggressive caching of the table contents (and we don't use arrays, just a row per element) but the application could be refactored (and simplified a lot) if we have a smart way to save data into the DB.
Bye, e. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings