[big nestloop with a huge number of rows]
You're in an edge case, and I doubt you'll get things to run much faster: you want the last 1k rows out of an 18M row result set... It will be slow no matter what you do.
What the plan is currently doing, is it's going through these 18M rows using a for each loop, until it returns the 1k requested rows. Without the offset, the plan is absolutely correct (and quite fast, I take it). With the enormous offset, it's a different story as you've noted.
An alternative plan could have been to hash join the tables together, to sort the result set, and to apply the limit/offset on the resulting set. You can probably force the planner to
do so by rewriting your statement using a with statement, too:
EXPLAIN ANALYZE
WITH rows AS (
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type ,s.doc_id ,s.svo_id
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)
INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
SELECT c.clause, s.subject ,s.object , s.verb, s.subject_type, s.object_type ,s.doc_id ,s.svo_id
FROM clause2 c INNER JOIN svo2 s ON (c.clause_id=s.clause_id AND c.source_id=s.doc_id AND c.sentence_id=s.sentence_id)
INNER JOIN page_content p ON (s.doc_id=p.crawled_page_id)
)
SELECT *
FROM
rows
ORDER BY svo_id limit 1000 offset 17929000
ORDER BY svo_id limit 1000 offset 17929000
I've my doubts that it'll make much of a different, though: you'll still be extracting the last 1k rows out of 18M.
D