Search Postgresql Archives

Re: Ideas for building a system that parses medical research publications/articles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Στις 5/6/21 1:52 μ.μ., ο/η Laura Smith έγραψε:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 10:49, Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx> wrote:

Hello

I am imagining a system that can parse papers from various sources
(web/files/etc) and in various formats (text, pdf, etc) and can store
metadata for this paper ,some kind of global ID if applicable, authors,
areas of research, whether the paper is "new", "highlighted",
"historical", type (e.g. Case reports, Clinical trials), symptoms (e.g.
tics, GI pain, psychological changes, anxiety, ), and other key
attributes (I guess dynamic), it must be full text searchable, etc.

I am at the very beginning in this and it is done on a fully volunteer
basis.

Lots of questions : is there any scientific/scholar analysis software
already available? If yes and is really good and open source , then this
will influence the rest of decisions. Otherwise , I'll have to form a
team that can write one, in this case I'll have to decide DB, language,
etc. I work 20 years with pgsql so it is the natural choice for any kind
of data, I just ask this for the sake of completeness.

All ideas welcome.
Hello Achilleas

Not wishing to be discouraging, but you have very ambitious goals for what sounds like a one-person project ?

You are effectively looking at competing with platforms such as Elsevier Scopus/Scival which are market-leaders in the area for good reason (i.e. it takes a lot of manpower to write algorithms, manage metadata etc., and the only way to consistently maintain that manpower is to employ people, lots of them).   There are also things like Google Scholar around the place.

I think before starting on the technical side of Postgres etc., the honest truth is that you need to do more planning, both in terms of implementation and long-term sustainability.

For example, before we even get to metadata, you talk of various sources and formats.  Have you considered licensing issues ?  Have you considered how to keep the dataset clean ? (If you are thinking you can just scrape the web, then you'll be in for a surprise).

All I got is some very vague descriptions coming from either ppl from the advocacy side or the medical side.

I got no idea on the legal status of those documents, as you know some are covered by the artistic license (a few in PubMed) some not,

I am not a lawyer. The data are not to be stored locally AFAIK, so only metadata will be kept locally and can be reset, refreshed, amended, etc

Parsing will be equivalent to a one-off human reading the article on the web. There is a lawyer handling all those. From the whole network of ppl interested in this whole endeavor,  I am the only guy with DB/software knowledge, hence why I volunteered.

I know its a huge work, but you are missing a point. Nobody wishes to compete with anyone. This is a about a project, a parent-advocacy non-profit that *ONLY* aims to save the sick children (or maybe also very young adults) of a certain spectrum . So the goal is to make the right tools for researchers, clinicians and parents. This market is too small to even consider making any money out of it, but the research is still very expensive and the progress slower than optimum.

Laura





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux