Greetings, * Fabio Pardi (f.pardi@xxxxxxxxxxxx) wrote: > thanks for your feedback. We prefer on these mailing lists to not top-post but instead to reply inline, as I'm doing here. This helps the conversation by eliminating unnecessary dialogue and being able to make comments regarding specific points clearly. > I agree with you the compression is playing a role in the comparison. > Probably there is a toll to pay when the load is high and the CPU > stressed from de/compressing data. If we will be able to bring our > studies that further, this is definitely something we would like to measure. I was actually thinking of the compression as having more of an impact with regard to the 'cold' cases because you're pulling fewer blocks when it's compressed. The decompression cost on CPU is typically much, much less than the cost to pull the data off of the storage medium. When things are 'hot' and in cache then it might be interesting to question if the compression/decompression is worth the cost. > I also agree with you that at the moment Postgres really shines on > relational data. To be honest, after seeing the outcome of our research, > we are actually considering to decouple some (or all) fields from their > JSON structure. There will be a toll to be payed there too, since we are > receiving data in JSON format. PostgreSQL has tools to help with this, you might look into 'json_to_record' and friends. > And the toll will be in time spent to deliver such a solution, and > indeed time spent by the engine in doing the conversion. It might not be > that convenient after all. Oh, the kind of reduction you'd see in space from both an on-disk and in-memory footprint would almost certainly be worth the tiny amount of CPU overhead from this. > Anyway, to bring data from JSON to a relational model is out of topic > for the current discussion, since we are actually questioning if > Postgres is a good replacement for Mongo when handling JSON data. This narrow viewpoint isn't really sensible though- what you should be thinking about is what's appropriate for your *data*. JSON is just a data format, and while it's alright as a system inter-exchange format, it's rather terrible as a storage format. > As per sharing the dataset, as mentioned in the post we are handling > medical data. Even if the content is anonymized, we are not keen to > share the data structure too for security reasons. If you really want people to take your analysis seriously, others must be able to reproduce your results. I certainly appreciate that there are very good reasons that you can't share this actual data, but your testing could be done with completely generated data which happens to be similar in structure to your data and have similar frequency of values. The way to approach generating such a data set would be to aggregate up the actual data to a point where the appropriate committee/board agree that it can be shared publicly, and then you build a randomly generated set of data which aggregates to the same result and then use that for testing. > That's a pity I know but i cannot do anything about it. > The queries we ran and the commands we used are mentioned in the blog > post but if you see gaps, feel free to ask. There were a lot of gaps that I saw when I looked through the article- starting with things like the actual CREATE TABLE command you used, and the complete size/structure of the JSON object, but really what a paper like this should include is a full script which creates all the tables, loads all the data, runs the analysis, calculates the results, etc. Thanks! Stephen
Attachment:
signature.asc
Description: PGP signature