Andy,
Thanks very much for your response.
No worries about raining on my parade. Your feedback is exactly what I'm
looking for -- praise is nice, but I really do prefer to have the
experts throwing rocks at my naive ideas :)
Please see my comments embedded below.
Steve
----- Original Message -----
From: "Andy Colson" <andy@xxxxxxxxxxxxxxx>
To: "Steve Petrie, P.Eng." <apetrie@xxxxxxxxxxxx>;
<pgsql-general@xxxxxxxxxxxxxx>
Sent: Thursday, January 07, 2016 10:17 PM
Subject: Re: Question -- Session Operations - Feasibility Of
Proposed Synchronization Method?
On 01/07/2016 06:30 PM, Steve Petrie, P.Eng. wrote:
Thanks to forum members for the four helpful replies, to my earlier
message that initiated this thread.
The replies expressed concerns, with the feasibility of my proposal
to use postgres tables to store short-lived context data, for dialog
continuity during website app transient sessions, with visitor
browsers over modeless HTTP connections.
Hope the four emails I sent in response (5 January 2016), went some
way to satisfying the concerns expressed.
Here is a list of the issues discussed, in the dialog mentioned
above:
1. "Session" defined;
2. Avoid row DELETEs;
3. Periodically TRUNCATE each table in a pool of session context
tables;
4. Embed a session ID key parameter in an HTML "hidden" field
(optional);
5. Use sequence generators as rapid global iterators controlling
access to session context tables;
<SNIP>
Thanks to forum members for taking the time to read my email.
This feels hugely overcomplicated.
I agree. It is complicated.
But I believe it's the minimum functionality required to both: 1. avoid
using the <row DELETE ... / table AUTOVACUUM / table VACUUM> approach,
to recycling "dead" session context row image storage space back to the
filesystem, and 2. enable use of the much faster TRUNCATE command on an
entire "dead" session context table..
I also didn't read most of the last thread, so forgive me if you've
answered this already: How many website requests a second (that
actually need to touch session data) are you expecting? How much
space is the session data going to take? (like, 5 Gig a day?)
Every incoming request to the website for non-static content, needs to
touch (INSERT or SELECT + UPDATE) the corresponding session context row.
That row is where the transient continuity context for the app session
dialog, gets stored, between request <i> and request <i+1> coming
in from the browser driving that app session.
So session data will be touched by every request that launches an app
php function, to service the next step in the session dialog with that
visitor.
But an individual session isn't going to live all that long, from the
time that it's context row gets INSERTed until the time that the session
"dies" and it's context row gets UPDATEd as "dead" in its "status"
column (the row is never explicitely DELETEd, the entire table in which
it resides gets TRUNCATEd).
If the website manages to register e.g. 100,000 subscribers in its first
year, it will be a runaway success. I'm not expecting more than a few
percent of subscribers to visit on any given day. So if the website
proves to be a massive winner, there will be maybe 1000 to 5000
subscriber sessions / day, each session being initiated, conducted and
then terminated over the time span of a few minutes (rarely more than 30
minutes).
But I do fear "success disaster" if suddenly, the website (which will
promote a politically controversial technology concept for preventing
freeway traffic congestion) gets media coverage in its initial market
area (the greater Toronto area in the province of Ontario, Canada), and
among a million+ people watching the 6-o'clock Toronto news, a few
thousand jump on their smart-phone browsers to hit the website, looking
to subscribe or send a contact message via web page form.
So I'm looking to build in capacity to handle brief intense bursts of
session traffic workload. Not anywhere near Google-scale, obviously. But
maybe to handle a 10-minute burst driving a maximum rate of e.g. 1000
requests / second to the database server (being mostly a combination of
an INSERT for each new session row, followed by a few <SELECT + UPDATE>s
to that row, as the session proceeds through its brief "life", towards
its inevitable "death".
Actual access activity to longer-lived data tables: 1. subscriber
membership table, 2. contact message table; will be orders-of-magnitude
lower, than activity in the session context tables.
Each visitor session is allowed a "quota" of requests (e.g. 25) so the
visitor gets 25 chances to e.g. click a "submit" button. There is also a
session timeout "quota" (e.g. 10 minutes) that will kill the session if
the visitor waits too long between requests.
So the session context tables in aggregate, do not keep growing and
growing. And session context data is short-term expendable data. No need
to log it for recovery. No need for checkpoints, or any other backup
provisions.
If all active session context data gets lost in a crash, no big deal.
Maybe a reputational hit for the website, but the visitors who had their
sessions brutally murdered by the crash, will not wait around for a
recovery of their sessions anyway. They will wander off and (hopefully)
come back to retry later. Most likely after the website administrator
sends out a post-crash apologetic mass email message to the entire
subscriber base ("Hey subscribers, guess what? We're so successful we
crashed!!" :)
Consider a worst case severe workload. There might be say, an intense
burst of activity for 10 seconds, that initiates 10000 active sessions
(10% of all website subscribers) spread over e.g. 10 tables. And during
the entire lifetime of each session, table storage consumed by that
session of say 10, session context row images (1 INSERT image + 9
UPDATE images). So in total there would be space for 100,000 row images
(10 images / session X 10000 sessions) allocated over the 10 tables, at
a peak of 10000 sessions online concurrently. At e.g. 1000 bytes / row
image, that would mean a peak total of 100 MB (100,000 row images X 1000
bytes / image) of storage allocated to the 10 session context tables
altogether.
Those 10000 suddenly-initiated concurrent sessions are then 10000
people, each poking at an HTML page in their browser, maybe once every
30 to 60 seconds, to trigger an HTTP request. So the database server
will receive from 167 to 333 <SELECT + UPDATE> requests per second
(10000 sessions / 60 seconds; 10000 sessions / 30 seconds).
But when the 10000 sessions created by that 10-second burst of session
initiations, gradually "die" over the next half hour or so, the entire
100 MB of storage in the 10 tables, will get recycled back to the
filesystem, by only 10 very swift TRUNCATE commands. And there will be
absolutely no <row DELETE ... / table AUTOVACUUM / table VACUUM)
workload overhead, to hamper performance during all that session
activity.
If its a huge number, you should put effort into growing horizontally,
not all of this stuff.
If its a small number, you'll spend more time fixing all the broken
things than its worth.
Have you benchmarked this? In my mind, complicated == slow.
All valid points.
Wouldn't using multiple session context tables, enable the possibility
of horizontal growth? Could separate pools of session context tables go
in separate tablespaces, each tablespace on a different SSHD for one
pool ?
I'm not underestimating the work required to get the idea to a stable
functional state.
Yes -- complicated does consume more cycles. That's why I'm proposing to
use sequence generators (as rapid-access global "iterators"), instead of
using some bottleneck of a tiny table, to keep order among the app
sessions as they look for permission to access the session context
tables.
No I haven't benchmarked the idea. It's still at the feasibility
consultation / investigation stage. But certainly benchmarking is going
to be mandatory. If I decide to proceed beyond just brainstorming the
idea, I will build and benchmark a prototype to stress-test the key
design ideas.
Sorry if I'm raining on your parade, it looks like you have really put
a lot of work into this.
I'm grateful to you for the downpour of advice :)
Yes I have invested quite a few hours. But all pleasant ones. And I have
the hours to invest, and the luxury of no boss' butt to kiss. So it
won't be a problem if I decide to ditch the idea of using postgres as a
session store, and go with e.g. your web proxy suggestion instead.
But I really would like if possible, to develop some innovation using
postgres, of significant but not overwhelming challenge. Something that
might help postgres invade a new market.
Have you considered saving session data to disk is faster than saving
to db? A good reverse web proxy can stick a session to the same
backend. 1 web proxy up front, 5 web servers behind it. I'd bet its
way faster.
Yes I have considered using a plain disk file, but then that's another
complication, no?
I'm already climbing a steep learning curve with postgres. So if I'm
going to complicate my life with some fancy session operations scheme, I
would prefer to lodge that complexity firmly in the world of postgres.
For sure, before I do more work on the idea I'm proposing, I will
investigate your idea of using a web proxy instead.
But then, adding a web proxy to the mix, would be a different kind of
complication in itself ...
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general