> > On 1 April 2012 13:52, Simon <slgard@xxxxxxxxx> wrote: > >> >> >> On 31 March 2012 20:44, Stuart Dallas <stuart@xxxxxxxx> wrote: >> >>> On 31 Mar 2012, at 13:14, Simon wrote: >>> >>> > Thanks again Stuart. >>> > >>> > On 31 March 2012 12:50, Stuart Dallas <stuart@xxxxxxxx> wrote: >>> >> On 31 March 2012 11:19, Simon <slgard@xxxxxxxxx> wrote: >>> >> Thanks for your answer. >>> >> >>> >> On 31 March 2012 09:50, Stuart Dallas <stuart@xxxxxxxx> wrote: >>> >> On 31 Mar 2012, at 02:33, Simon wrote: >>> >> >>> >> > Or: Why doesn't PHP have Applications variables like ASP.NET (and >>> node.js) >>> >> > ? >>> >> > >>> >> > Hi, >>> >> > >>> >> > I'm working on optimising a php application (Drupal). >>> >> > >>> >> > The best optimisation I've found so far is to use APC to store >>> various bits >>> >> > of Drupal data in RAM. >>> >> > >>> >> > The problem with this is that with Drupal requiring say 50Mb of >>> data* per >>> >> > request is that lots of cpu cycles are wasted de-serialising data >>> out of >>> >> > apc_fetch. Also 50Mb of data per http process !! is wasted by each >>> one >>> >> > re-creating it's own copy of the shared data. >>> >> >>> >> 50MB? WTF is it storing?? I've never used Drupal, but based purely on >>> that it sounds like an extremely inefficient piece of software that's best >>> avoided! >>> >> >>> >> All sorts of stuff (taxonomies, lists of data, menu structures, >>> configuration settings, content etc). Drupal is a sophisticated >>> application. Besides, 50Mb of data seems like relatively tiny "application >>> state" to want to access in fastest possible way. It's not hard to imagine >>> wanting to use *much* more than this in future >>> >> >>> >> >>> >> > If it were possible for apc_fetch (or similar function) to return a >>> pointer >>> >> > to the data rather than a copy of the data this would enable >>> incredible >>> >> > reduction in cpu and memory usage. >>> >> >>> >> Vanilla PHP adheres to a principle known as "shared nothing >>> architecture" in which, shockingly, nothing is shared between processes or >>> requests. This is primarily for scalability reasons; if you stick to the >>> shared nothing approach your application should be easily scalable. >>> >> >>> >> Yes, I know. I think the effect of this is that php will scale better >>> (on average) in situations where requests don't need to share much data >>> such as "shared hosting". In an enterprise enviroment where the whole >>> server might be dedicated to single application, "shared nothing" seems to >>> be a synonym for "re-load everything" ? >>> >> >>> >> Yes, on one level that is what it means, but alternatively it could >>> mean being a lot more conservative about what you load for each request. >>> > >>> > Um, I want to be *less* conservative. Possibly *much* less. (like >>> Gigabyes or even eventually Petabytes of shared data !) >>> >>> We appear to have drifted off the point. There's a big difference >>> between data that an application needs to access and "application >>> variables". >> >> >>> What you're describing is a database. If you want something more >>> performant there are ways to optimise access to that amount of data, but if >>> not I've completely lost what the problem is that you're trying to solve. >>> >> >> Right now I have a need to store maybe 50Mb - 200Mb of data in RAM >> between requests. I suggesteed PetaBytes as an example of how much might be >> beneficial at some considerable point in the future to highlight how >> relatively un-scalable "passing by copy" (ie memcached / APC) is compared >> to application variables. >> >> >>> >> > This is essentially how ASP.NET Application variables and node.js >>> work. >>> >> >>> >> Not a valid comparison. Node.js applications can only share variables >>> within a single process, and they can do so because it's single-threaded. >>> Once you scale your app beyond a single process you'd need to add a custom >>> layer on to share data between them. >>> >> >>> >> I'm not sure about the architecture behind IIS and ASP.net but I >>> imagine there are similar paradigms at work. >>> >> >>> >> I totally agree although, I *think* IIS uses multiple threads >>> running in a single process (or "Application Pool"). >>> >> I realise that ASP.NET / node.js have their own architectural issues >>> but I'm confident that for enterprise applications >>> >> (ie Drupal) the option for "shared something" is capable of many >>> orders of magnitude higher performance and scalability than "shared >>> nothing". >>> >> >>> >> And that's why there are so many options around that enable such >>> functionality. The need for something doesn't in any way imply that it >>> should be part of the core system. Consider the impact such a requirement >>> would have on the environment in which you run PHP. By delegating that >>> "feature" to third-party modules, the PHP core doesn't need to concern >>> itself with the details of how to share data between processes on every >>> target platform. >>> > >>> > Agreed. If you were able to point me in the direction of such a 3rd >>> party module I'd be a very happy man. >>> >>> APC and memcached are two of the most common examples, other than the >>> vast array of DBMSes out there. >>> >> >> Thanks, but APC And memcached are not even remotely comparable to >> Applications variables in terms of performance or memory efficiency. >> >> >>> >> > I'm surprised PHP doesn't already have Application variables, given >>> that >>> >> > they are so similar to Session Variables and that it's been around >>> for a >>> >> > long time in ASP / ASP.NET. >>> >> >>> >> Just because x does it, doesn't mean y should. I've used lots of >>> languages over the years, including classic ASP, ASP.net, Perl, Python, >>> Ruby, PHP (obv), and more, and I'm yet to see a compelling reason to want >>> application variables. >>> >> >>> >> The reason that I'm suggesting this is because taking the example of >>> Drupal, the ability to share information between requests "by reference" >>> rather than by copy has the potential to be *millions* of times faster. >>> Assuming I had say a 5Mb dataset that I wanted to re-use between request >>> and lets say (optimistically) that "de-serialising" an object from >>> apc_fetch takes 10 cpu cycles per "character" it would be ~50 million* >>> times faster to pass this data as a pointer ? *Assuming simplistically >>> that the pointer can be passed in 1 cpu cycle. >>> >> >>> >> You say "by reference" but I'm not convinced that the implementation >>> of application variables means they're not copied into each process. In >>> addition, the cost of de-serialising data is minuscule in the grand scheme >>> of any non-trivial application. >>> > >>> > No, I am 100% certain they're not copied into each process. >>> >>> One process cannot access data in another process without it being >>> copied. A thread can access data from another thread without copying it, >>> but if it's not read-only it needs to be access-controlled which would be a >>> massive performance hit. I don't know because I've never cared, but I'd bet >>> good money that when you read an application variable in asp.net, you >>> get a copy of that data. >>> >> >> I've been doing some research elsewhere and I believe it is possible to >> share memory between unix processes in a way which would make application >> variables possible in PHP. >> >> >> http://stackoverflow.com/questions/6447195/linux-sharing-already-mapped-memory-between-processes >> >> >>> >>> >> >Let go of the possibility of application variables and your thinking >>> will shift to other ways of solving the problem. >>> >> I've spent a long time thinking about this and whilst I can think of >>> many other ways to "solve" this problem (APC, memcached, SHM) they all >>> suffer from the problem that "passing by copy" is potentially millions or >>> billions of times slower than passing by reference and is potentially >>> *hundreds* of times less memory efficient. >>> >> >>> >> If you had a further suggestions I'd be very interested to hear them. >>> >> >>> >> See below. >>> >> >>> >> > I just wondered if there was a reason for not having this >>> functionality or >>> >> > if it's on a road map somewhere or I've missed something :) ? >>> >> >>> >> >>> >> As far as I am aware, ASP and ASP.net are the only web technologies >>> to support application variables out of the box. You think that's simply >>> because the others just haven't gotten around to it yet? >>> > >>> > Honestly, I don't know. I realise there benefits in certain >>> circumstances to shared nothing. However if I have an application where I >>> want to maintain state between requests (ie any non trivial application?) >>> it seems that Application variables (or an event loop) are many orders of >>> magnitude more performant and >>> > there doesn't seem to be a way to achieve the same in PHP. >>> >>> What do you want to store between requests? If it's per-user then you >>> want sessions (I have some views on the "traditional" implementation and >>> usage of sessions, but that's for another email). If you want to store data >>> that needs to be made available to every user, that's why databases exist. >>> If a database is too slow then you can use memcached. If you're only ever >>> going to be on one server you can use APC. There's no need for PHP to >>> natively support this feature. >>> >> >> I want to store datasets that are used on every request in RAM to save >> loading them from a database or other cache on every request. >> I don't want to use memcached because it's *much* faster and *hundreds of >> times* more memory efficient to use application variables. >> >> >>> >>> >> It would be great if someone could tell me specifically why I'm wrong >>> OR if I can persuade the php community that "shared nothing" is wrong in >>> certain circumstances (basically enterprise applications!) and application >>> variables could be added to PHP >>> >> >>> >> You're not wrong in saying that it can be incredibly useful to be >>> able to share common data between processes, but I think you're approaching >>> it from the wrong angle. Let's take the list of things that Drupal wants to >>> store... >>> >> >>> >> * taxonomies >>> >> * menu structures >>> >> * configuration settings >>> >> >>> >> I'm guessing these things don't change while the application is >>> running, and could easily be dumped out to PHP files that can then be >>> included as needed, at a far lower processing cost than accessing a shared >>> data store. >>> >> >>> > I think this suffers from the at least the same overhead as apc_fetch >>> > >>> > And an advantage of Applications variables is that they can change >>> (very) frequently. >>> >>> Reading PHP files, especially when you use a bytecode cache, is one of >>> the fastest way to read data. If the data is changing frequently then you >>> want a database / memcached / APC (see my previous answer). >>> >> >> One of the fastest maybe. Application variables is *the* fastest by >> orders of magnitude. >> >> >>> >> * content >>> >> >>> >> If you're talking about caching static content please refer to my >>> answer above - no reason these can't also be stored in files. If you're >>> talking about caching generated output then memcached is the best solution >>> I've found. >>> > >>> > I've actually found caching to a filesystem to be 5x faster than >>> memcached (remembering that *nix automatically caches frequently used files >>> in RAM) >>> >>> Above you said that using files would have at least the same overhead as >>> APC. >>> >>> >> * lists of data >>> >> >>> >> Not sure what you mean by this, but one of the above two answers >>> probably applies. >>> > >>> > Actually, I mean Drupal "Views" you are correct. >>> >>> For caching output I've used files (fast when subsequent requests bypass >>> PHP), memcached (incredibly fast), and a caching proxy. >>> >> >> I need to be able to write applications which generate personalised >> content for each user. This makes using a caching proxy essentially >> impossible. >> .NET applications (such as Umbraco) are generally (easily) fast enough to >> work without external caching *because* they have application variables. >> >> I'm very much getting the impression that PHP developers just don't >> realise how important Application variables are for performance. (Proof: >> your idea of fast, is memcached - no offence) >> >> >>> >>> >> My basic point is that the shared nothing approach to scalability has >>> been proven as a big benefit, and I would hate to see that feature of PHP >>> compromised just because use cases exist where it's not idea. Better to >>> have add-ons to provide what you need. >>> > >>> > As above, agreed. If you were able to point me in the direction of an >>> add-on I'll be very happy. >>> >>> I have, several times. APC is one option but is limited to a single >>> server. Memcached is, IMO, the best multi-server option. If you're talking >>> about more than ~1MB of data I'd go with a database. >>> >>> Getting back to the gigabytes or even petabytes of data you want to >>> share across the application, what do you have against databases? >>> >> >> Nothing. I've been programming for 30 odd years. I totally get when and >> why you'd use a database - AND when and why you'd want to store a dataset >> in an application variable. >> >> What I need to do is persuade PHP developers that we need application >> variables (in a module if necessary) to enable PHP based applications to >> compete with .NET in terms of performance. (One reason PHP is able to >> compete successfully against .NET right now is because people are unaware >> of the performance differences). >> >> As someone who as spent ~4 years as a PHP programmer and ~10 years as a >> .NET developer, I can pretty confidently say that .NET applications can be >> architected to *utterly wipe the floor* with PHP applications in terms of >> performance. The same is also true of node.js. >> >> The difference is "application variables". >> > >> >> >>> >>> -Stuart >>> >>> -- >>> Stuart Dallas >>> 3ft9 Ltd >>> http://3ft9.com/ >>> >> >> > Another thing that's possible in .NET is the Singleton design pattern. (Application variables are an implementation of this pattern) This makes it possible to instantiate a static class so that a single instance of the object is available to all threads (ie requests) across your application. So for example the code below creates a single instance of an object for the entire "server". Any code calling "new App();" gets a pointer to the shared object. If PHP could do this, it would be *awesome* and I wouldn't need application variables since this is a superior solution. Can / could PHP do anything like this ? public class App { private static App instance; private App() {} public static App Instance { get { if (instance == null) { instance = new App(); } return instance; } } } Creates an inste