Help - I don't know why I am getting this message, and I don't know how to fix it. Any advice will be greatly appreciated.
Note that the file does exist:! (How do I know if it is looking under the correct directory? Other times I have done similar temporary table creations with no problems!):
ls -l /zfs/postgres/postgres13/base/16482/681830
-rw------- 1 postgres postgres 122880 Sep 7 18:37 /zfs/postgres/postgres13/base/16482/681830
-rw------- 1 postgres postgres 122880 Sep 7 18:37 /zfs/postgres/postgres13/base/16482/681830
I am using postgresql 13 on ubuntu 20.04, and postgresql is on a zfs filesysem which reports no errors. I can access every table in my database, and can run every stored procedure (plpgsql and plpython3u) on it except the one that generates the above error..
Here is the output from my run:
select cmm_tsv_to_tables('/zfs/EXPERIMENT_A_C_NORMAL_0_1_2/10');
NOTICE: ['found_patterns.tsv', 'classification.tsv', 'mined_patterns.tsv']
NOTICE: ['tmp_2021_09_08_19_49_42_354417_found_patterns', 'tmp_2021_09_08_19_49_42_354417_classification', 'tmp_2021_09_08_19_49_42_354417_mined_patterns']
NOTICE: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)
ERROR: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
LINE 1: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(...
^
QUERY: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)
CONTEXT: Traceback (most recent call last):
PL/Python function "cmm_tsv_to_tables", line 39, in <module>
plpy.execute(sql1)
PL/Python function "cmm_tsv_to_tables"
NOTICE: ['found_patterns.tsv', 'classification.tsv', 'mined_patterns.tsv']
NOTICE: ['tmp_2021_09_08_19_49_42_354417_found_patterns', 'tmp_2021_09_08_19_49_42_354417_classification', 'tmp_2021_09_08_19_49_42_354417_mined_patterns']
NOTICE: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)
ERROR: spiexceptions.UndefinedFile: could not open file "base/16482/681830": No such file or directory
LINE 1: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(...
^
QUERY: CREATE TABLE tmp_2021_09_08_19_49_42_354417_found_patterns(Class TEXT,filename TEXT,pattern TEXT)
CONTEXT: Traceback (most recent call last):
PL/Python function "cmm_tsv_to_tables", line 39, in <module>
plpy.execute(sql1)
PL/Python function "cmm_tsv_to_tables"
and here (and attached) is the function up to the point where the error happens:
---------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION cmm_tsv_to_tables( tsv_dir TEXT) RETURNS TEXT AS
$$
import csv
import datetime
import glob
import os
import regex
import time
ts = time.time()
prefix = "tmp_" + regex.sub(r"\D+","_",str(datetime.datetime.fromtimestamp(time.time()).isoformat())) + "_"
if not os.path.exists(tsv_dir):
plpy.notice("Scout output directory " + tsv_dir + " does not exist. Rerun this software with a directory which contains the tsv files you wish to database.")
return ""
try:
os.chdir(tsv_dir)
except:
plpy.notice("postgresql does not have access to the directory " + tsv_dir + " of tsv files. Exiting!")
return ""
tsv_file = glob.glob('*.tsv')
if len(tsv_file) == 0:
plpy.notice("There are no tsv files in " + tsv_dir + ". Exiting!")
return ""
plpy.notice(tsv_file)
table = [ prefix+os.path.splitext(x)[0] for x in tsv_file]
plpy.notice(table)
# create tables with text columns, some of which will later be casted to numbers, arrays, etc.
for j in range(len(tsv_file)):
with open(tsv_file[j]) as f:
cols = f.readline().split()
colsdef = ','.join([x + ' TEXT' for x in cols])
sql1 = "CREATE TABLE {}({})".format(table[j], colsdef)
plpy.notice(sql1)
#return 'exit'
plpy.execute(sql1)
sql1 = "COPY {} FROM '{}' WITH (FORMAT 'csv', HEADER, DELIMITER E'\t', NULL 'NULL')".format(table[j], tsv_file[j]);
plpy.notice(sql1)
plpy.execute(sql1)
return "Tables were written"
$$ LANGUAGE plpython3u;
$$
import csv
import datetime
import glob
import os
import regex
import time
ts = time.time()
prefix = "tmp_" + regex.sub(r"\D+","_",str(datetime.datetime.fromtimestamp(time.time()).isoformat())) + "_"
if not os.path.exists(tsv_dir):
plpy.notice("Scout output directory " + tsv_dir + " does not exist. Rerun this software with a directory which contains the tsv files you wish to database.")
return ""
try:
os.chdir(tsv_dir)
except:
plpy.notice("postgresql does not have access to the directory " + tsv_dir + " of tsv files. Exiting!")
return ""
tsv_file = glob.glob('*.tsv')
if len(tsv_file) == 0:
plpy.notice("There are no tsv files in " + tsv_dir + ". Exiting!")
return ""
plpy.notice(tsv_file)
table = [ prefix+os.path.splitext(x)[0] for x in tsv_file]
plpy.notice(table)
# create tables with text columns, some of which will later be casted to numbers, arrays, etc.
for j in range(len(tsv_file)):
with open(tsv_file[j]) as f:
cols = f.readline().split()
colsdef = ','.join([x + ' TEXT' for x in cols])
sql1 = "CREATE TABLE {}({})".format(table[j], colsdef)
plpy.notice(sql1)
#return 'exit'
plpy.execute(sql1)
sql1 = "COPY {} FROM '{}' WITH (FORMAT 'csv', HEADER, DELIMITER E'\t', NULL 'NULL')".format(table[j], tsv_file[j]);
plpy.notice(sql1)
plpy.execute(sql1)
return "Tables were written"
$$ LANGUAGE plpython3u;
Attachment:
cmm_tsv_to_tables.plpython3u
Description: Binary data