Re: Design / Implementation problem

"Ted Byers" <r.ted.byers@xxxxxxxxxx> · Sun, 18 Mar 2007 11:25:06 -0400

Naz

First, there is nothing you can do about the computational load except make 
your code as efficient as possible.  Get it right and then make it fast. 
But there is only so much you can do.  If a "calculation" requires an 
integer sum and an integer difference, you inevitably have two integer 
operations per "calculation".  For a given complex calculation, you 
generally have to resort to a little algebra to put the calculation into a 
form that requires the least number of atomic operations (think of a mean 
and variance, for a simple example where a brute force approach will be 
correct but much more expensive than one based on a little algebraic 
manipulation.  That leaves making your application appear to be more 
responsive by distributing the computational load differently.  This time 
think of some basic data structures and the work required to obtain sorted 
output.  Considering the standard C++ STL, one could use a  vector, with 
contents in random order, and sort that, or one could use a map and have 
sorted output at any time without an explicit sort operation before getting 
it.  But think what has happened.  Element access, including especially 
inserts or additions and reading container elements, will be very fast with 
the vector, faster than with a map.  But all the computational load for 
obtaining sorted output is carried at the moment the request for such output 
is made.  On the other hand, getting sorted output from the map will appear 
to be blindingly fast relative to getting the same output from the vector. 
Why?  Because getting sorted output requires a certain number of 
comparisons.  This can not be avoided.  But with the map, these are done on 
insert, so your data is actually stored sorted.  The price of the sort is 
paid in both the case of using a vector and in the case of using the map, 
but the user may not notice that cost in the case of a map where he would in 
the case of the vector (if the dataset is large).  I suspect you are aware 
of some of this already, but I say it just to make certain of it.

I say all the above so that you don't get too distracted by your 
calculations of the potential costs of supporting your program.  You need to 
have an idea of how expensive it wll be, and how you might distribute the 
costs, but if you have millions of clients, and each operation requires a 
hald a dozen basic operations, there is no avoiding that.  You reduce as 
much as possible, but you will always get to a point where it can not be 
reduced further.  At that point, all you can do is throw more iron at the 
application.

Now, an option to consider.  Create two tables, one for points accrued and 
one for points used.  In each case, you'll have fields to identify the 
customer, his transactions, etc., but the important fields will be one for 
the number of points (acquired or used) on a given day, and the date.  It 
may also be prudent to have a field to represent expiry date since how long 
points can be stored before they have to be used is a policy decision, one 
that could change at any time, and ought therefore be a data item stored 
rather than hardcoded into your code.  The points acquired table will need a 
field to represent the points acquired on that date that remain to be used, 
so that transactions using points can use the oldest points acquired by the 
customer before using new ones.  You may well want this for auditing 
purposes in addition to the core functionality you're after.  Now add a 
third which keeps track of the number of points available at a given time. 
Again, you'll have fields to identify the customer, the number of available 
points, and maybe the date (it depends on how you want to do it and what 
else your database needs to support).  Now, update the points acquired table 
and the summary table in each transaction in which the customer can gain 
points, and update the points used table and summary table in each 
transaction where the customer can use points.  And you can update all three 
in transactions where a customer can use points for part of the transaction 
and other resources for the rest.  At least until your unit testing is done, 
I'd suggest an extra table or two that allows you to store information about 
which points a customer is using for a given transaction, just to make 
absolutely certain that in fact your customer is using the oldest points 
that he has, in a given transaction, before the ones he acquired today. 
Finally, you may want to create archive tables and a scheduled task that 
moves records dealing with expired points from the table you're always 
working with to the archive tables (in enough detail that you can 
reconstruct precisely how and when points were acquired and used by each 
customer), but you might need extra code to bring them back should a policy 
make decide that points can have a three year lifespan instead of one.  I'd 
put all the required code to support all of this into a suite of stored 
procedures.

Note, this should allow your customers to get their information almost 
instantly since the amount of data that would be processed for them would be 
very small.  If they want to know how many points they have, and even how 
many points they have that will expire in the next day, they're looking at 
dirt simple queries with at most one substraction.

Note, I did not spend the time to refine this to minimize the total 
computational load or the data storage requirements.  That could take days, 
or even weeks, depending on your attention to detail and how concerned you 
are about efficiency.  I am sure there are some computational efficiencies 
that can be gained, with additional analysis, perhaps with some tradeoffs 
regarding the detail of the data stored available for audit purposes, but 
I'll leave that as an exercise for you.  :-)

Note, I do not see how, from what you'd written, how your proposed solution 
would ensure that customers used their oldest points first.  Maybe it does, 
and you didn't describe that aspect well, but that is something you'll have 
to be careful about if you want to avoid upset customers.  Unit testing, 
followed by integration tests, are your friends!

HTH

Ted