Re: Serving partial data of in-memory common data set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi André,

> Concurring with Jonathan about the free advice and the
> tenuous relevance to the main list topic, I'd nevertheless
> want to try to contribute.

Thanks for trying to contribute to the discussion. I have
looked at the mailing lists for apache and could not figure
out a better forum than this. But, if you have a suggestion
let me know and I will be happy to continue this message
thread else where.

> My summary of the issue :
> - there are N clients accessing the site
> - each client is authenticated, with a client-id of some
> kind
> - they all request originally the same URL
> - the server however returns a page to each client that can
> be different, based on a server-side client profile,
> selected as per the client-id
> - the returned page is different, because it includes for
> each client, a different mixture of "items" in the page,
> based on the client profile
> - each client gets a different selection of i items, but
> these i items are picked among a grand total of I items,
> which are themselves always the same
> - you would like to cache at least part of these I items in
> memory, to speed up the responses to the clients
> 

You have hit right on the nail.

> You haven't given us any hard numbers, 
> like how many clients there are, how concurrently they
> access the server,

We started with 30 concurrent users and there was no trouble,
but when the next batch of 70 users hit concurrently, we could
not serve all the users.

> how many I items there really are, how large each I item is,

We have about about 10 set of 50 pages and each page containing
varying number of images (on an average 45 images of about 2KB
and about 50 or so 0.5KB images) and an occasional multi-media
file. For this discussion, we can ignore the multi-media file. 

The page content and the sequence of page presentation is
user dependent, but for a group of users this 50 page set
is constant. With each request we do updates on 3 mysql tables.

As such if you add it up all it is not lot of content but
we are chocking. 

Our eventual objective is to serve about 200,000 users (of
course not with our existing hardware) and we are looking
at various options.

> how fast the server is, how much memory it has, or anything
> of the kind.

The current server is a bit old and has pentium 2.00GHz
single core and has 1 gb ram. We are looking to upgrade
as soon as our situation improves.

> You have mentioned that some of the items I were "media",
> which I personally tend to associate with "large",
> byte-wise.

Our image sizes are not big, but there are a moderate number
of images on a page. We can ignore the multi-media for now.

> My very first reaction would be to ask myself if it is all
> really worth it.  Caching in memory, no matter how it's
> done, has a cost.  A cost in design, complexity, and in
> pure cache management.

This exploration is to hear about other user's experience.
We are clearly chocking when we reach beyond say about 60
seemingly concurrent users. Users who get pages initially
will continue to get serviced well, but others don't get
served with images at all or takes forever to get pages.

> Modern operating systems already cache disk data.  So
> if a same "object" is accessed frequently in a short period
> of time, it will already be in the practice cached in memory
> buffers by the OS.  Below the OS level, good disk
> controllers also cache frequently accessed data.  Below
> the controllers, disks themselves cache data in cache
> memory.
> Caching it yet again, with  a different piece of
> software, may just add overhead.
> 
> An additional aspect is that, if some of the objects are
> large, and your server has limited memory, caching many such
> objects may fill up the physical memory, and cause the
> system to start swapping, which would really have the
> opposite effect to what you're looking for.
> 
> On the other hand, for Apache to access an object on disk,
> requires on the part of Apache quite a bit of work; all the
> more work the deeper the object resides in the "document
> space", because Apache needs to "walk" the directory
> hierarchy, all the while checking access and other rules at
> each level.  So by organising your objects smartly on
> disk, so as to minimise the work Apache has to do to find it
> and return it, you may gain a whole lot of processing time.
> 
> And servers nowadays are cheap. For the time and money
> you'd spend studying the best caching scheme, you could
> easily buy an extra server with terabytes of disk space and
> gigabytes of ram to use as I/O cache.

Hardware is cheap we are looking to upgrade and may be upgrade
will solve some of our problems.

> So basically what I am saying, is : try it, without any
> clever caching scheme, but with a clever organisation of
> your data and an efficient Apache configuration.  That
> /may/ show a problem and a bottleneck, which you can then
> tackle on its own merits.  On the other hand, it may
> show no problem at all.

We have definitely a problem and looking at various options
to resolve it. If caching works we would like to look into
it. If hardware resolves the problem, then that is good too.

> A lot of work has gone into Apache, to make it as efficient
> as possible to serve content of all kinds.  There are
> thousands of Apache sites handling thousands of clients, and
> a lot of content.

Agreed.

> Do not spend a lot of time ahead of time, to solve what is
> maybe a non-existent problem.  As someone said a long
> time ago : premature optimisation is the source of much
> evil.

We clearly have a problem. I am not closing any options. If
optimization helps, I will take that route, if hardware helps
then, we don't want not ignore that. Our available resources
will definitely chart the course of action.

Thank you for taking time to pose some pertinent questions.




      

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux