On Wed, 22 May 2019, Adrian Klaver wrote:
A sample of the data you are cleaning up.
Adrian, et al.: I have it working properly now. Both org_id and person_id numbers are prepended to each row in the appropriate table and they are unique because each series begins one greater than the max(*_id) in each table.
I think what people are trying to wrap there head around is how 800 lines in the file is being split into two subsets: the organization data and the people data. In particular how that is being done to preserve the relationship between organizations and people? This is before it ever gets to the database.
After cleaning there are 655 lines rather than 800, but the short answer is that spliting the data preserves the relationship between organization and its people: #!/usr/bin/gawk # Read input file, write fields to both organizations and people # input files. BEGIN { FS=OFS="," } # for organizations table input: { print $1, $2, $4, "," $8, $9, $10, "'US'," $11, ",,," "'Opportunity','');" > "z-orgs.sql" } # for people table input: { print $6, $5, "," $1, $3, $4, $5, $7, $8, $9, $10, "'US'," $11, "," $12, "," $13, "'true','');" > "z-people.sql" } You can see that the org_id field ($1) is used in both files, but in different columns in the two tables. Regards, Rich