Re: Question -- Session Operations - Feasibility Of Proposed Synchronization Method?

"Steve Petrie, P.Eng." <apetrie@xxxxxxxxxxxx> · Fri, 8 Jan 2016 17:41:56 -0500

Andy,

Thanks very much for your response.

No worries about raining on my parade. Your feedback is exactly what I'm 
looking for -- praise is nice, but I really do prefer to have the 
experts throwing rocks at my naive ideas :)

Please see my comments embedded below.

Steve

----- Original Message ----- 
From: "Andy Colson" <andy@xxxxxxxxxxxxxxx>
To: "Steve Petrie, P.Eng." <apetrie@xxxxxxxxxxxx>; 
<pgsql-general@xxxxxxxxxxxxxx>
Sent: Thursday, January 07, 2016 10:17 PM
Subject: Re:  Question -- Session Operations - Feasibility Of 
Proposed Synchronization Method?

On 01/07/2016 06:30 PM, Steve Petrie, P.Eng. wrote:
Thanks to forum members for the four helpful replies, to my earlier 
message that initiated this thread.

The replies expressed concerns, with the feasibility of my proposal 
to use postgres tables to store short-lived context data, for dialog 
continuity during website app transient sessions, with visitor 
browsers over modeless HTTP connections.

Hope the four emails I sent in response (5 January 2016), went some 
way to satisfying the concerns expressed.

Here is a list of the issues discussed, in the dialog mentioned 
above:

1. "Session" defined;
2. Avoid row DELETEs;
3. Periodically TRUNCATE each table in a pool of session context 
tables;
4. Embed a session ID key parameter in an HTML "hidden" field 
(optional);
5. Use sequence generators as rapid global iterators controlling 
access to session context tables;

<SNIP>

Thanks to forum members for taking the time to read my email.

This feels hugely overcomplicated.

I agree. It is complicated.

But I believe it's the minimum functionality required to both: 1. avoid 
using the <row DELETE ... / table AUTOVACUUM / table VACUUM> approach, 
to recycling "dead" session context row image storage space back to the 
filesystem, and 2. enable use of the much faster TRUNCATE command on an 
entire "dead" session context table..

I also didn't read most of the last thread, so forgive me if you've 
answered this already:  How many website requests a second (that 
actually need to touch session data) are you expecting?  How much 
space is the session data going to take?  (like, 5 Gig a day?)

Every incoming request to the website for non-static content, needs to 
touch (INSERT or SELECT + UPDATE) the corresponding session context row. 
That row is where the transient continuity context for the app session 
dialog, gets stored, between request  <i>  and request   <i+1>  coming 
in from the browser driving that app session.

So session data will be touched by every request that launches an app 
php function, to service the next step in the session dialog with that 
visitor.

But an individual session isn't going to live all that long, from the 
time that it's context row gets INSERTed until the time that the session 
"dies" and it's context row gets UPDATEd as "dead" in its "status" 
column (the row is never explicitely DELETEd, the entire table in which 
it resides gets TRUNCATEd).

If the website manages to register e.g. 100,000 subscribers in its first 
year, it will be a runaway success. I'm not expecting more than a few 
percent of subscribers to visit on any given day. So if the website 
proves to be a massive winner, there will be maybe 1000 to 5000 
subscriber sessions / day, each session being initiated, conducted and 
then terminated over the time span of a few minutes (rarely more than 30 
minutes).

But I do fear "success disaster" if suddenly, the website (which will 
promote a politically controversial technology concept for preventing 
freeway traffic congestion) gets media coverage in its initial market 
area (the greater Toronto area in the province of Ontario, Canada), and 
among a million+ people watching the 6-o'clock Toronto news, a few 
thousand jump on their smart-phone browsers to hit the website, looking 
to subscribe or send a contact message via web page form.

So I'm looking to build in capacity to handle brief intense bursts of 
session traffic workload. Not anywhere near Google-scale, obviously. But 
maybe to handle a 10-minute burst driving a maximum rate of  e.g. 1000 
requests / second to the database server (being mostly a combination of 
an INSERT for each new session row, followed by a few <SELECT + UPDATE>s 
to that row, as the session proceeds through its brief "life", towards 
its inevitable "death".

Actual access activity to longer-lived data tables: 1. subscriber 
membership table, 2. contact message table; will be orders-of-magnitude 
lower, than activity in the session context tables.

Each visitor session is allowed a "quota" of requests (e.g. 25) so the 
visitor gets 25 chances to e.g. click a "submit" button. There is also a 
session timeout "quota" (e.g. 10 minutes) that will kill the session if 
the visitor waits too long between requests.

So the session context tables in aggregate, do not keep growing and 
growing. And session context data is short-term expendable data. No need 
to log it for recovery. No need for checkpoints, or any other backup 
provisions.

If all active session context data gets lost in a crash, no big deal. 
Maybe a reputational hit for the website, but the visitors who had their 
sessions brutally murdered by the crash, will not wait around for a 
recovery of their sessions anyway. They will wander off and (hopefully) 
come back to retry later. Most likely after the website administrator 
sends out a post-crash apologetic mass email message to the entire 
subscriber base ("Hey subscribers, guess what? We're so successful we 
crashed!!" :)

Consider a worst case severe workload. There might be say, an intense 
burst of activity for 10 seconds, that initiates 10000 active sessions 
(10% of all website subscribers) spread over e.g. 10 tables. And during 
the entire lifetime of each session, table storage consumed by that 
session of say 10, session context row images  (1 INSERT image + 9 
UPDATE images). So in total there would be space for 100,000 row images 
(10 images / session X 10000 sessions) allocated over the 10 tables, at 
a peak of 10000 sessions online concurrently. At e.g. 1000 bytes / row 
image, that would mean a peak total of 100 MB (100,000 row images X 1000 
bytes / image) of storage allocated to the 10 session context tables 
altogether.

Those 10000 suddenly-initiated concurrent sessions are then 10000 
people, each poking at an HTML page in their browser, maybe once every 
30 to 60 seconds, to trigger an HTTP request. So the database server 
will receive from 167 to 333 <SELECT + UPDATE> requests per second 
(10000 sessions / 60 seconds; 10000 sessions / 30 seconds).

But when the 10000 sessions created by that 10-second burst of session 
initiations, gradually "die" over the next half hour or so, the entire 
100 MB of storage in the 10 tables, will get recycled back to the 
filesystem, by only 10 very swift TRUNCATE commands. And there will be 
absolutely no <row DELETE ... / table AUTOVACUUM / table VACUUM) 
workload overhead, to hamper performance during all that session 
activity.

If its a huge number, you should put effort into growing horizontally, 
not all of this stuff.
If its a small number, you'll spend more time fixing all the broken 
things than its worth.
Have you benchmarked this?  In my mind, complicated == slow.

All valid points.

Wouldn't using multiple session context tables, enable the possibility 
of horizontal growth? Could separate pools of session context tables go 
in separate tablespaces, each tablespace on a different SSHD for one 
pool ?

I'm not underestimating the work required to get the idea to a stable 
functional state.

Yes -- complicated does consume more cycles. That's why I'm proposing to 
use sequence generators (as rapid-access global "iterators"), instead of 
using some bottleneck of a tiny table, to keep order among the app 
sessions as they look for permission to access the session context 
tables.

No I haven't benchmarked the idea. It's still at the feasibility 
consultation / investigation stage. But certainly benchmarking is going 
to be mandatory. If I decide to proceed beyond just brainstorming the 
idea, I will build and benchmark a prototype to stress-test the key 
design ideas.

Sorry if I'm raining on your parade, it looks like you have really put 
a lot of work into this.

I'm grateful to you for the downpour of advice :)

Yes I have invested quite a few hours. But all pleasant ones. And I have 
the hours to invest, and the luxury of no boss' butt to kiss. So it 
won't be a problem if I decide to ditch the idea of using postgres as a 
session store, and go with e.g. your web proxy suggestion instead.

But I really would like if possible, to develop some innovation using 
postgres, of significant but not overwhelming challenge. Something that 
might help postgres invade a new market.

Have you considered saving session data to disk is faster than saving 
to db?  A good reverse web proxy can stick a session to the same 
backend.  1 web proxy up front, 5 web servers behind it.  I'd bet its 
way faster.

Yes I have considered using a plain disk file, but then that's another 
complication, no?

I'm already climbing a steep learning curve with postgres. So if I'm 
going to complicate my life with some fancy session operations scheme, I 
would prefer to lodge that complexity firmly in the world of postgres.

For sure, before I do more work on the idea I'm proposing, I will 
investigate your idea of using a web proxy instead.

But then, adding a web proxy to the mix, would be a different kind of 
complication in itself ...

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general