In fact the plain vanilla installation, with all the defaults, runs this fast, please read further.On 11/12/24 11:37, Achilleas Mantzios - cloud wrote:... We run perf on both systems for 90 seconds to make sure that it included at least two runs of the slow system, and we attach both outputs. We run perf as : sudo perf record -g -p <PID> -- sleep 90 it strikes me that we dont find similarities between the two, but we are new to perf.Not sure, but it seems the slow profile has a lot of writes. Not sure why. Do both instances have the same work_mem value / available memory, storatge?
Maybe EXPLAIN VERBOSE would show if the remote query is the same ... Maybe try setting log_temp_files=0 before running the query. Writing the foreign scan result set into a temp file could be happening on one of the machines only.
No temp files, the remote query shown by EXPLAIN (... VERBOSE) and ms sql it self is :
Remote query: SELECT NULL FROM [db_ro_non_delosnav].[ACDOC]
Aggregate
(cost=168359742.33..168359742.34 rows=1 width=8) (actual
time=15100.248..15100.250 rows=1 loops=1)
Output: count(*)
-> Foreign Scan on mssql_bdynacom."ACDOC"
(cost=200.00..168355533.86 rows=1683386 width=100) (actual
time=3.783..14967.029 rows=1683386 loops=1)
Output: "ID", "DOC_TYPE", "PARAST", "TRNS_DATE",
"VIA_MNG", "VIA_MNG_ID", "FL_UPD", "NOTES", "REM_DATE",
"REM_CLS", "OPN_CLS", "ACSITE_ID", "USERS_ID", "LAST_UPD_DA
TE", "LAST_UPD_USERS_ID", "OTH_TYPE", "OTH_TYPE_DESC",
"OTH_APPL", "FL_PRN", "CREATE_DATE", "LOG_NOTE", "PER_ID",
"ACDEPT_ID", "FL_DLT", "CMP_ID", "INTPER_ID", "FL_INT_TRNCL
S", "FL_ACTV", "OLD_ACDEPT_ID", "OLD_USERS_ID", "OLD_TRNS_DATE",
"OLD_FL_UPD_DATE", "FL_FNLZ", "ERROR_NOTES", "FL_ATTCH",
"OPDOC_ID", "FL_ONHOLD", "ONHOLD_NOTES", "FL_RVS",
"RVS_DOC_ID", "FL_REV", "REV_NO", "REV_DOC_ID", "REV_NOTES",
"REV_DATE", "REV_USER_ID", "FL_PROT", "INTGR_DOC_ID",
"INTGR_DOC_TP", "P_DOC_ID", "FL_OPN", "FL_MDL", "USERUPDAC
C_ID", "DATE_UPDACC", "USERIN_ID", "USERUPD_ID", "DATE_IN",
"DATE_UPD", "DOC_SUBTYPE", "DOC_REF", "FRTRN_TYPE", "FL_ACC",
"ATCH_DIR", "ATCH_REM", "KEYID", "OTH_APPL1", "RGST
R_ID", "ACDOC_LIST_NO", "EXEC_DATE", "EXEC_USER_ID",
"EXEC_NOTES", "CNL_EXEC_DATE", "CNL_EXEC_USER_ID", "FL_EXEC",
"CNL_EXEC_NOTES", "LOG_NOTE1", "USERREUPDACC_ID", "DATE_RE
UPDACC", "P_ACDOC_ID", "FL_AUTHORITY_APRV", "FL_APPROVE_TYPE",
"FL_NOTINCLUDE_INAUTH", "FL_NO_OTHERCURRENCY", "FL_WFSTATUS",
"FL_APRV", "USRACTV_LOG_ID", "USERIN_DNAME", "US
ERUPD_DNAME", "UROLEIN_ID", "UROLEIN_NAME", "UDEPTIN_ID",
"UDEPTIN_NAME", "FL_LOAD_5805", "UROLEUPD_ID", "UROLEUPD_NAME",
"CNTRY_ID", fl_togetback, "FULLPATH", "FL_UPD_OLD",
"FL_REOINTG", "FL_OINTGR_RESEND"
Remote query: SELECT NULL FROM
[db_ro_non_delosnav].[ACDOC]
Query Identifier: 6812542821581303630
Planning Time: 14029.724 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 0.168 ms, Inlining 2.087 ms, Optimization
3.301 ms, Emission 3.564 ms, Total 9.121 ms
Execution Time: 15102.803 ms
(12 rows)
I gather this is how it computes the plan.
I think we have narrowed down the problem, and this is extremely strange :
It is not a matter of pgsql version 10 or 16, it is a matter of the postgres user, the user that owns the data dir(s) and the user of the postgres process. We reproduced both the problem and the solution with all combinations of versions.
To sum it up :
Our sysadm created the system debian Debian GNU/Linux 12 (bookworm) with postgres as user 1000. Now at some point we realized that whenever we run a pgsql cluster with another user (I found that after spending two good days testing), the above query runs in about 1 second. With user postgres 1000 in 30 seconds. As you saw the perf output are completely different.
On the test
VM, we removed the postgres user, recreated with uid=1003, chown
-R all the mount points + table spaces, started postgres with
the new postgres user , and bingo . The chown -R on the 5TB is
instant... just saying ...
We are puzzled what can be causing this. Tomorrow we dig into GDB , *trace and the like.
If this rings any bells we would be more than grateful to know.
Also, the
worse, is the suspicion that maybe our whole infra performance
is affected. We hope it is only free-tds and tds_fdw .
regards