Why not store session as integer?
And timestamp as timesamp(z?) ?
If you know the types of events, also store them as integer , and save a map of them in the app or on another table ?
And save the parameters as a json column, so you have more data-types? Hstore only has strings.
Be carefull with the mongodb hipster on the stackoverflow post. Elasticsearch is often used for log collection.
So, what really is the problem ?
On Tue, May 13, 2014 at 4:11 AM, Peeyush Agarwal <peeyushagarwal1994@xxxxxxxxx> wrote:
Hi,I have log data of the following format:Session Timestamp Event Parameters 1 1 Started Session 1 2 Logged In Username:"user1" 2 3 Started Session 1 3 Started Challenge title:"Challenge 1", level:"2" 2 4 Logged In Username:"user2"
Now, a person wants to carry out analytics on this log data (And would like to receive it as a JSON blob after appropriate transformations). For example, he may want to receive a JSON blob where the Log Data is grouped bySession
andTimeFromSessionStart
andCountOfEvents
are added before the data is sent so that he can carry out meaningful analysis. Here I should return:
[ { "session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }] }, { "session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}] } ]
Here,TimeFromSessionStart
,CountOfEvents
etc. [Let's call it synthetic additional data] will not be hard coded and I will make a web interface to allow the person to decide what kind of synthetic data he requires in the JSON blob. I would like to provide a good amount of flexibility to the person to decide what kind of synthetic data he wants in the JSON blob.
If I use PostgreSQL, I can store the data in the following manner:Session
andEvent
can bestring
,Timestamp
can bedate
andParameters
can behstore
(key value pairs available in PostgreSQL). After that, I can use SQL queries to compute the synthetic (or additional) data, store it temporarily in variables in a Rails Application (which will interact with PostgreSQL database and act as interface for the person who wants the JSON blob) and create JSON blob from it.
However I am not sure if PostgreSQL is the best choice for this use case. I have put the detailed question on SO at http://stackoverflow.com/questions/23544604/log-data-analyticsLooking for some help from the community.Peeyush Agarwal