On 5/22/19 10:53 AM, Rich Shepard wrote:
On Wed, 22 May 2019, Francisco Olarte wrote:
Also, when I speak of "unique identifier" I'm not speaking of the one if
your FINAL tables, I assume you would have at least the *_id field as
PKEY, so nothing else needed, but the one in your SOURCE data set (it can
be anything, like the row number in the original excel).
Francisco/Jeremy,
I'm grateful for you patient help. The 'unique identifier' in the source
file has been provided (just now) using nl <https://ss64.com/bash/nl.html>.
The syntax I used is:
nl -b a -n ln -s , -v 339 source.txt > out.txt
because the organizations table has 338 as the maximum org_id number.
I believe this fulfills the need for a known unique ID in the source file,
and when I parse each row using gawk to create the two files for table
input
I can use it in both the organizations table (as the PK) and the people
table (as the FK referring to the organizations table). I can let postgres
assign the unique ID for the new rows in the people table.
Am I still missing something critical?
A sample of the data you are cleaning up.
I think what people are trying to wrap there head around is how 800
lines in the file is being split into two subsets: the organization data
and the people data. In particular how that is being done to preserve
the relationship between organizations and people? This is before it
ever gets to the database.
MMM, apart from angel dust I do not know what PCP could stand for.
Primary Care Physician.
Best regards,
Rich
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx