We moved our stuff out of AWS a little over a year
ago because the performance was crazy inconsistent and
unpredictable. I think they do a lot of oversubscribing so you
get strange sawtooth performance patterns depending on who else
is sharing your infrastructure and what they are doing at the
time.
The same unit of work would take 20 minutes each for
several hours, and then take 2 1/2 hours each for a day, and
then back to 20 minutes, and sometimes anywhere in between for
hours or days at a stretch. I could never tell the business
when the processing would be done, which made it hard for them
to set expectations with customers, promise deliverables, or
manage the business. Smaller nodes seemed to be worse than
larger nodes, I only have theories as to why. I never got
good support from AWS to help me figure out what was
happening.
My first thought is to run the same test on different
days of the week and different times of day to see if the
numbers change radically. Maybe spin up a node in another
data center and availability zone and try the test there
too.
My real suggestion is to move to Google Cloud or Rackspace
or Digital Ocean or somewhere other than AWS. (We moved to
Google Cloud and have been very happy there. The performance
is much more consistent, the management UI is more intuitive,
AND the cost for equivalent infrastructure is lower too.)