>-----Original Message----- >From: pgsql-general-owner@xxxxxxxxxxxxxx >[mailto:pgsql-general-owner@xxxxxxxxxxxxxx] On Behalf Of garry >saddington >Sent: zondag 28 januari 2007 14:06 >To: pgsql-general@xxxxxxxxxxxxxx >Subject: [GENERAL] counting query > >I have a table definition such as: > >CREATE TABLE attendance >( > attendanceid serial primary key, Why you have this??? You already have (entered,timeperiod,studentid) that you can use, since that must be unique too. Try to avoid surrogate keys as much as possible (it really increases performance and ease-of-writing for complex queries! > entered date DEFAULT current_date NOT NULL, > absent boolean, > authorization text default 'N', > timeperiod char(2) check(timeperiod in('AM','PM')), > days varchar(10), > studentid int, > unique(entered,timeperiod,studentid) >) Guessing the meaning a bit, not too self-explaining. >Which is used to record school attendance data. I am now >trying to write a query to identify trends in absences by >counting the days column and returning any student that has >repeated absences on certain days. I am struggling to return >anything that does not need further manipulation in Python >before being useful. >Does anyone have any ideas? Yes, before starting you must have a well-defined idea on what you want to know. What should the result look like? In most situations start simple things, and eventually combine these to something more complex, but always know what you are querying. You should have some idea of what results you will be getting. Of course, this is a proces of discovery rather than following set rules. Some background on probabilities and statistics really helps. Personally I like visualization quite a lot to help me with this. Personally I've found nothing that will beat Excel for doing data analysis. Learn to use the pivot table and pivot charts. They are extremely powerful. However, it can be a bit tricky to transform the input into something the tool can use. A good starting point is to split the dates into seperate year, month, day, week values. Some idea's that might work to get it started: * A graph with days vs occurrences (count). * Graph of total sick days per student vs occurrences. * Graph of Check the occurrences per month/day/week Next try filtering of the data to form sequential periods and more funny things. I believe all this can be done with plain SQL and you don't need any python or plsql or other languages. - Joris Dobbelsteen