Re: Ideas for building a system that parses medical research publications/articles

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Sat, 5 Jun 2021 10:03:33 -0700

On 6/5/21 9:56 AM, Achilleas Mantzios wrote:

Στις 5/6/21 6:34 μ.μ., ο/η Adrian Klaver έγραψε:
On 6/5/21 2:49 AM, Achilleas Mantzios wrote:
Hello

I am imagining a system that can parse papers from various sources 
(web/files/etc) and in various formats (text, pdf, etc) and can store 
metadata for this paper ,some kind of global ID if applicable, 
authors, areas of research, whether the paper is "new", 
"highlighted", "historical", type (e.g. Case reports, Clinical 
trials), symptoms (e.g. tics, GI pain, psychological changes, 
anxiety, ), and other key attributes (I guess dynamic), it must be 
full text searchable, etc.

I am at the very beginning in this and it is done on a fully 
volunteer basis.

Lots of questions : is there any scientific/scholar analysis software 
already available? If yes and is really good and open source , then 
this will influence the rest of decisions. Otherwise , I'll have to 
form a team that can write one, in this case I'll have to decide DB, 
language, etc. I work 20 years with pgsql so it is the natural choice 
for any kind of data, I just ask this for the sake of completeness.

All ideas welcome.

A quick search found this:

https://solutionsreview.com/data-management/the-best-open-source-data-catalog-tools-to-consider/ 

Might be a good starting point on what is already out there.

This is interesting, so the keywords are "Data Catalog" ?

What I searched on was 'open source article catalog'.

There is also this:

The Directory of Open Access Journals
https://doaj.org/

This seems very very poor. Just try a search there and then repeat in 
PMC (PubMed Central).

This is down to copyright issues I'm sure. For PubMed Central see:

https://www.ncbi.nlm.nih.gov/pmc/about/copyright/

for the if/ands/buts that restrict what you can do with the information 
and stay legal.

It seems to be a service, not downloadable software.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx