Re: Could apc_fetch return a pointer to data in shared memory ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
> On 1 April 2012 13:52, Simon <slgard@xxxxxxxxx> wrote:
>
>>
>>
>> On 31 March 2012 20:44, Stuart Dallas <stuart@xxxxxxxx> wrote:
>>
>>> On 31 Mar 2012, at 13:14, Simon wrote:
>>>
>>> > Thanks again Stuart.
>>> >
>>> > On 31 March 2012 12:50, Stuart Dallas <stuart@xxxxxxxx> wrote:
>>> >> On 31 March 2012 11:19, Simon <slgard@xxxxxxxxx> wrote:
>>> >> Thanks for your answer.
>>> >>
>>> >> On 31 March 2012 09:50, Stuart Dallas <stuart@xxxxxxxx> wrote:
>>> >> On 31 Mar 2012, at 02:33, Simon wrote:
>>> >>
>>> >> > Or: Why doesn't PHP have Applications variables like ASP.NET  (and
>>> node.js)
>>> >> > ?
>>> >> >
>>> >> > Hi,
>>> >> >
>>> >> > I'm working on optimising a php application (Drupal).
>>> >> >
>>> >> > The best optimisation I've found so far is to use APC to store
>>> various bits
>>> >> > of Drupal data in RAM.
>>> >> >
>>> >> > The problem with this is that with Drupal requiring say 50Mb of
>>> data* per
>>> >> > request is that lots of cpu cycles are wasted de-serialising data
>>> out of
>>> >> > apc_fetch. Also 50Mb of data per http process !! is wasted by each
>>> one
>>> >> > re-creating it's own copy of the shared data.
>>> >>
>>> >> 50MB? WTF is it storing?? I've never used Drupal, but based purely on
>>> that it sounds like an extremely inefficient piece of software that's best
>>> avoided!
>>> >>
>>> >> All sorts of stuff (taxonomies, lists of data, menu structures,
>>> configuration settings, content etc). Drupal is a sophisticated
>>> application. Besides, 50Mb of data seems like relatively tiny "application
>>> state" to want to access in fastest possible way. It's not hard to imagine
>>> wanting to use *much* more than this in future
>>> >>
>>> >>
>>> >> > If it were possible for apc_fetch (or similar function) to return a
>>> pointer
>>> >> > to the data rather than a copy of the data this would enable
>>> incredible
>>> >> > reduction in cpu and memory usage.
>>> >>
>>> >> Vanilla PHP adheres to a principle known as "shared nothing
>>> architecture" in which, shockingly, nothing is shared between processes or
>>> requests. This is primarily for scalability reasons; if you stick to the
>>> shared nothing approach your application should be easily scalable.
>>> >>
>>> >> Yes, I know. I think the effect of this is that php will scale better
>>> (on average) in situations where requests don't need to share much data
>>> such as "shared hosting". In an enterprise enviroment where the whole
>>> server might be dedicated to single application, "shared nothing" seems to
>>> be a synonym for "re-load everything" ?
>>> >>
>>> >> Yes, on one level that is what it means, but alternatively it could
>>> mean being a lot more conservative about what you load for each request.
>>> >
>>> > Um, I want to be *less* conservative. Possibly *much* less. (like
>>> Gigabyes or even eventually Petabytes of shared data !)
>>>
>>> We appear to have drifted off the point. There's a big difference
>>> between data that an application needs to access and "application
>>> variables".
>>
>>
>>> What you're describing is a database. If you want something more
>>> performant there are ways to optimise access to that amount of data, but if
>>> not I've completely lost what the problem is that you're trying to solve.
>>>
>>
>> Right now I have a need to store maybe 50Mb - 200Mb of data in RAM
>> between requests. I suggesteed PetaBytes as an example of how much might be
>> beneficial at some considerable point in the future to highlight how
>> relatively un-scalable "passing by copy" (ie memcached / APC) is compared
>> to application variables.
>>
>>
>>> >> > This is essentially how ASP.NET Application variables and node.js
>>> work.
>>> >>
>>> >> Not a valid comparison. Node.js applications can only share variables
>>> within a single process, and they can do so because it's single-threaded.
>>> Once you scale your app beyond a single process you'd need to add a custom
>>> layer on to share data between them.
>>> >>
>>> >> I'm not sure about the architecture behind IIS and ASP.net but I
>>> imagine there are similar paradigms at work.
>>> >>
>>> >> I totally agree although,  I *think* IIS uses multiple threads
>>> running in a single process (or "Application Pool").
>>> >> I realise that ASP.NET / node.js have their own architectural issues
>>> but I'm confident that for enterprise applications
>>> >> (ie Drupal) the option for "shared something" is capable of many
>>> orders of magnitude higher performance and scalability than "shared
>>> nothing".
>>> >>
>>> >> And that's why there are so many options around that enable such
>>> functionality. The need for something doesn't in any way imply that it
>>> should be part of the core system. Consider the impact such a requirement
>>> would have on the environment in which you run PHP. By delegating that
>>> "feature" to third-party modules, the PHP core doesn't need to concern
>>> itself with the details of how to share data between processes on every
>>> target platform.
>>> >
>>> > Agreed. If you were able to point me in the direction of such a 3rd
>>> party module I'd be a very happy man.
>>>
>>> APC and memcached are two of the most common examples, other than the
>>> vast array of DBMSes out there.
>>>
>>
>> Thanks, but APC And memcached are not even remotely comparable to
>> Applications variables in terms of performance or memory efficiency.
>>
>>
>>> >> > I'm surprised PHP doesn't already have Application variables, given
>>> that
>>> >> > they are so similar to Session Variables and that it's been around
>>> for a
>>> >> > long time in ASP / ASP.NET.
>>> >>
>>> >> Just because x does it, doesn't mean y should. I've used lots of
>>> languages over the years, including classic ASP, ASP.net, Perl, Python,
>>> Ruby, PHP (obv), and more, and I'm yet to see a compelling reason to want
>>> application variables.
>>> >>
>>> >> The reason that I'm suggesting this is because taking the example of
>>> Drupal, the ability to share information between requests "by reference"
>>> rather than by copy has the potential to be *millions* of times faster.
>>> Assuming I had say a 5Mb dataset that I wanted to re-use between request
>>> and lets say (optimistically) that "de-serialising" an object from
>>> apc_fetch takes 10 cpu cycles per "character" it would be ~50 million*
>>> times faster to pass this data as a pointer ?  *Assuming simplistically
>>> that the pointer can be passed in 1 cpu cycle.
>>> >>
>>> >> You say "by reference" but I'm not convinced that the implementation
>>> of  application variables means they're not copied into each process. In
>>> addition, the cost of de-serialising data is minuscule in the grand scheme
>>> of any non-trivial application.
>>> >
>>> > No, I am 100% certain they're not copied into each process.
>>>
>>> One process cannot access data in another process without it being
>>> copied. A thread can access data from another thread without copying it,
>>> but if it's not read-only it needs to be access-controlled which would be a
>>> massive performance hit. I don't know because I've never cared, but I'd bet
>>> good money that when you read an application variable in asp.net, you
>>> get a copy of that data.
>>>
>>
>> I've been doing some research elsewhere and I believe it is possible to
>> share memory between unix processes in a way which would make application
>> variables possible in PHP.
>>
>>
>> http://stackoverflow.com/questions/6447195/linux-sharing-already-mapped-memory-between-processes
>>
>>
>>>
>>> >> >Let go of the possibility of application variables and your thinking
>>> will shift to other ways of solving the problem.
>>> >> I've spent a long time thinking about this and whilst I can think of
>>> many other ways to "solve" this problem (APC, memcached, SHM) they all
>>> suffer from the problem that "passing by copy" is potentially millions or
>>> billions of times slower than passing by reference and is potentially
>>> *hundreds* of times less memory efficient.
>>> >>
>>> >> If you had a further suggestions I'd be very interested to hear them.
>>> >>
>>> >> See below.
>>> >>
>>> >> > I just wondered if there was a reason for not having this
>>> functionality or
>>> >> > if it's on a road map somewhere or I've missed something :) ?
>>> >>
>>> >>
>>> >> As far as I am aware, ASP and ASP.net are the only web technologies
>>> to support application variables out of the box. You think that's simply
>>> because the others just haven't gotten around to it yet?
>>> >
>>> > Honestly, I don't know. I realise there benefits in certain
>>> circumstances to shared nothing. However if I have an application where I
>>> want to maintain state between requests (ie any non trivial application?)
>>> it seems that Application variables (or an event loop) are many orders of
>>> magnitude more performant and
>>> > there doesn't seem to be a way to achieve the same in PHP.
>>>
>>> What do you want to store between requests? If it's per-user then you
>>> want sessions (I have some views on the "traditional" implementation and
>>> usage of sessions, but that's for another email). If you want to store data
>>> that needs to be made available to every user, that's why databases exist.
>>> If a database is too slow then you can use memcached. If you're only ever
>>> going to be on one server you can use APC. There's no need for PHP to
>>> natively support this feature.
>>>
>>
>> I want to store datasets that are used on every request in RAM to save
>> loading them from a database or other cache on every request.
>> I don't want to use memcached because it's *much* faster and *hundreds of
>> times* more memory efficient to use application variables.
>>
>>
>>>
>>> >> It would be great if someone could tell me specifically why I'm wrong
>>> OR if I can persuade the php community that "shared nothing" is wrong in
>>> certain circumstances (basically enterprise applications!) and application
>>> variables could be added to PHP
>>> >>
>>> >> You're not wrong in saying that it can be incredibly useful to be
>>> able to share common data between processes, but I think you're approaching
>>> it from the wrong angle. Let's take the list of things that Drupal wants to
>>> store...
>>> >>
>>> >> * taxonomies
>>> >> * menu structures
>>> >> * configuration settings
>>> >>
>>> >> I'm guessing these things don't change while the application is
>>> running, and could easily be dumped out to PHP files that can then be
>>> included as needed, at a far lower processing cost than accessing a shared
>>> data store.
>>> >>
>>> > I think this suffers from the at least the same overhead as apc_fetch
>>> >
>>> > And an advantage of Applications variables is that they can change
>>> (very) frequently.
>>>
>>> Reading PHP files, especially when you use a bytecode cache, is one of
>>> the fastest way to read data. If the data is changing frequently then you
>>> want a database / memcached / APC (see my previous answer).
>>>
>>
>> One of the fastest maybe. Application variables is *the* fastest by
>> orders of magnitude.
>>
>>
>>> >> * content
>>> >>
>>> >> If you're talking about caching static content please refer to my
>>> answer above - no reason these can't also be stored in files. If you're
>>> talking about caching generated output then memcached is the best solution
>>> I've found.
>>> >
>>> > I've actually found caching to a filesystem to be 5x faster than
>>> memcached (remembering that *nix automatically caches frequently used files
>>> in RAM)
>>>
>>> Above you said that using files would have at least the same overhead as
>>> APC.
>>>
>>> >> * lists of data
>>> >>
>>> >> Not sure what you mean by this, but one of the above two answers
>>> probably applies.
>>> >
>>> > Actually, I mean Drupal "Views" you are correct.
>>>
>>> For caching output I've used files (fast when subsequent requests bypass
>>> PHP), memcached (incredibly fast), and a caching proxy.
>>>
>>
>> I need to be able to write applications which generate personalised
>> content for each user. This makes using a caching proxy essentially
>> impossible.
>> .NET applications (such as Umbraco) are generally (easily) fast enough to
>> work without external caching *because* they have application variables.
>>
>> I'm very much getting the impression that PHP developers just don't
>> realise how important Application variables are for performance. (Proof:
>> your idea of fast, is memcached - no offence)
>>
>>
>>>
>>> >> My basic point is that the shared nothing approach to scalability has
>>> been proven as a big benefit, and I would hate to see that feature of PHP
>>> compromised just because use cases exist where it's not idea. Better to
>>> have add-ons to provide what you need.
>>> >
>>> > As above, agreed. If you were able to point me in the direction of an
>>> add-on I'll be very happy.
>>>
>>> I have, several times. APC is one option but is limited to a single
>>> server. Memcached is, IMO, the best multi-server option. If you're talking
>>> about more than ~1MB of data I'd go with a database.
>>>
>>> Getting back to the gigabytes or even petabytes of data you want to
>>> share across the application, what do you have against databases?
>>>
>>
>> Nothing. I've been programming for 30 odd years. I totally get when and
>> why you'd use a database - AND when and why you'd want to store a dataset
>> in an application variable.
>>
>> What I need to do is persuade PHP developers that we need application
>> variables (in a module if necessary) to enable PHP based applications to
>> compete with .NET in terms of performance. (One reason PHP is able to
>> compete successfully against .NET right now is because people are unaware
>> of the performance differences).
>>
>> As someone who as spent ~4 years as a PHP programmer and ~10 years as a
>> .NET developer, I can pretty confidently say that .NET applications can be
>> architected to *utterly wipe the floor* with PHP applications in terms of
>> performance. The same is also true of node.js.
>>
>> The difference is "application variables".
>>
>
>>
>>
>>>
>>> -Stuart
>>>
>>> --
>>> Stuart Dallas
>>> 3ft9 Ltd
>>> http://3ft9.com/
>>>
>>
>>
>
Another thing that's possible in .NET is the Singleton design pattern.
(Application variables are an implementation of this pattern)

This makes it possible to instantiate a static class so that a single
instance of the object is available to all threads (ie requests) across
your application.

So for example the code below creates a single instance of an object for
the entire "server". Any code calling "new App();"  gets a pointer to the
shared object.

If PHP could do this, it would be *awesome* and I wouldn't need application
variables since this is a superior solution.

Can / could PHP do anything like this ?

public class App
{
   private static App instance;
   private App() {}
   public static App Instance
   {
      get
      {
         if (instance == null)
         {
            instance = new App();
         }
         return instance;
      }
   }
}


Creates an inste

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux