Possible - I’d have to dig into that with the script author. I was just handed an R script (I don’t work with R…) and told here’s the data it needs, here’s the output we need stored in the DB. I then spent just enough time with the script to figure out how to hook up the I/O. The schema is pretty much just a raw dump of the output - I haven’t really spent any resources figuring out what, exactly, the data is. Maybe I should :-)
Not sure. Here’s the output from pg_stat_all_indexes: volcano_seismology=# select * from pg_stat_all_indexes where relname='data'; relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch -------+------------+------------+---------+---------------------------+----------+--------------+--------------- 19847 | 19869 | public | data | data_pkey | 0 | 0 | 0 19847 | 19873 | public | data | date_station_channel_idx | 811884 | 12031143199 | 1192412952 19847 | 19875 | public | data | station_channel_epoch_idx | 8 | 318506 | 318044 19847 | 19876 | public | data | station_data_idx | 9072 | 9734 | 1235 19847 | 19877 | public | data | station_date_idx | 721616 | 10927533403 | 10908912092 19847 | 20479 | public | data | data_station_channel_idx | 47293 | 194422257262 | 6338753379 (6 rows) so they *have* been used (although not the station_data_idx so much), but this doesn’t tell me when it was last used, so some of those may be queries I was experimenting with to see what was fastest, but are no longer in use. Maybe I should keep an eye on this for a while, see which values are increasing.
While datetime partitioning seems to be the most common, I’m not clear on how that would help here, as the most intensive queries need *all* the datetimes for a given station, and even the smaller queries would be getting an arbitrary time range potentially spanning several, if not all, partitions. Now portioning on station seems to make sense - there are over 100 of those, and pretty much any query will only deal with a single station at a time. Perhaps if more partitioning would be better, portion by both station and channel? The queries that need to be fastest will only be looking at a single channel of a single station. I’ll look into this a bit more, maybe try some experimenting while I still have *relatively* little data. My main hesitation here is that in the brief look I’ve given partitioning so far, it looks to be a royal pain to get set up. Any tips for making that easier? Thanks for the suggestion! --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 |