Στις 5/6/21 4:45 μ.μ., ο/η Vijaykumar
Jain έγραψε:
I checked, it behaves better with downloaded PDF rather than URL PDFs, in the 2nd case the metadata are poor.
Does not work with nih articles (but this is general problem not
tika's )
To get started with collecting doc metadata. It looks this tool can help you started.postgres does support fuzzy text search, so I do think dumping meta data /abstract in postgresql and then using trigram tsearch etc like extensions it should work well for a POC.this being a pg mailing list :) what would be your expectation of type of data and growth of data would be your queries.If you store data to support multiple lingual papers, will postgresql be able to handle ?Ideally the docs would be stored somewhere on a object storage etc and the link of the same would be stored in the db when someone would request to read the whole paper.Long before I read thisSo if this could work, your POC should too :) with postgresql.
--
On Sat, 5 Jun 2021 at 5:14 PM Laura Smith <n5d9xq3ti233xiyif2vp@xxxxxxxxxxxxx> wrote:
Sent with ProtonMail Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, 5 June 2021 12:14, Achilleas Mantzios <achill@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> I know its a huge work, but you are missing a point. Nobody wishes to
> compete with anyone. This is a about a project, a parent-advocacy
> non-profit that ONLY aims to save the sick children (or maybe also
> very young adults) of a certain spectrum . So the goal is to make the
> right tools for researchers, clinicians and parents. This market is too
> small to even consider making any money out of it, but the research is
> still very expensive and the progress slower than optimum.
Unfortunately I'm not "missing a point", your final paragraph summarises your position.
You have been taken in by the very charitable goal of saving sick children.
Unfortunately your head has been disconnected from your heart.
If we put the charitable purpose to one side and take a purely objective view at what you want to do, my original statement still stands, i.e. the certainty that you are grossly underestimating the technical and practical complexities of what you want to achieve.
Thanks,VijayMumbai, India