On Sat, 7 Jun 2014, Lorenzo Pieralisi wrote: > On Sat, Jun 07, 2014 at 05:10:27PM +0100, Nicolas Pitre wrote: > > On Sat, 7 Jun 2014, Abhilash Kesavan wrote: > > > > > Hi Nicolas, > > > > > > The first man of the incoming cluster enables its snoops via the > > > power_up_setup function. During secondary boot-up, this does not occur > > > for the boot cluster. Hence, I enable the snoops for the boot cluster > > > as a one-time setup from the u-boot prompt. After secondary boot-up > > > there is no modification that I do. > > > > OK that's good. > > > > > Where should this be ideally done ? > > > > If I remember correctly, the CCI can be safely activated only when the > > cache is disabled. So that means the CCI should ideally be turned on > > for the boot cluster (and *only* for the boot CPU) by the bootloader. > > CCI ports are enabled per-cluster, so the boot loader must turn on > CCI for all clusters before the respective CPUs have a chance to > turn on their caches. It is a secure operation, this can be overriden > and probably that's what has been done, wrongly. Careful. By saying "for all clusters" you might be interpreted as saying that the CCI must be turned on even for CPUs that are not powered up. > True, TC2 on warm-boot reenables CCI, and that's because it runs the > kernel in secure world, and again that's _wrong_. Let me respectfully disagree. > It must be done in firmware, and I am totally against any attempt at > papering over what looks like a messed up firmware implementation with > a bunch of hacks in the kernel, because that's what the patch below is > (soft restarting a CPU to enable CCI ? and adding generic code for that ? > what's next ?) Are you promoting for the removal of drivers/bus/arm-cci.c ? You do realize that the fundamental raison d'être for MCPM is actually to manage the race free enabling of the cache and CCI ? > I understand there is an issue and lots at stake here, but I do not want the > patch below to be merged in the kernel, I am sorry, it is a tad too much. Lorenzo: Don't get me wrong. The Chromebooks, and possibly to some extent some people at Samsung, were simply too confident in their ability to create absolutely bug-free firmware code to the point of not making its update easy in the field. This is completely outrageous in my point of view. Yet one of the reactions was to call upstream kernel people as purists because the kernel is so much easier to modify in order to cover their mess and yet that might not be accepted. Like I said I won't stop shaming them publicly for their own "incompetence" just yet (no pun intended), but being excessively purist does not benefit anyone either, and for that they have a point. *HOWEVER* I have no choice but to say that many people at ARM, including a couple individuals for whom I nevertheless have a lot of admiration, also have an incredible faith in their ability to convince themselves, and then turn around to preach to the world, that *more firmware* is going to be so much purer and solve so many more problems than it creates and become such a magical success that we should immediately dedicate our soul to the cause with no hint of a doubt. I'm sorry to rain on your parade, but I don't believe in this one iota. Let me repeat the MCPM story again: it took 3 people, including 2 from ARM, over *six* months to get everything right and stable on TC2. I think you also contributed to that effort as well. Subsequent MCPM backend contributions (yes, just the backend and not the core code) took at least *five* rounds of reviews in one case, and after *seven* rounds in another case it is still not right, despite the publicly available TC2 implementation to serve as a reference. I'm sure each time a new patch set was posted, their authors honestly believed their code was correct. Otherwise why would they post buggy code? Now you are telling me that they should have put that code into firmware instead? Can you realize what a catastrophe this would have been? Are you _seriously_ believing that they would be up to their 5th firmware revision by now? And that updating their firmware six months after product launch would be as easy as updating the kernel? Software ALWAYS has bugs, whether it is user apps, the kernel, firmware or boot ROM. The bigger one of those parts is, the more bugs it will have. And the cost to vendors for fixing those bugs grow exponentially down each level. For proof, we're now considering possible workarounds in the kernel to sidestep the difficulty with updating the firmware on a Chromebook. Yet you're saying that firmware should grow code with the same complexity as the MCPM core, plus a machine specific backend that experience has proven multiple times is evidently hard to get right, into firmware because running Linux in secure mode is wrong? If so we don't live in the same world indeed. The day I see a firmware architecture that allows for 1) the same level of peer review as what we enjoy with the Linux kernel code and 2) the same ability to perform updates in the field as the kernel, then maybe I could be sold on the many advantages having generic firmware might have. In the meantime I consider complex firmware as a very suboptimal architecture with no bearing on the reality of actual short-cycled products, and if they prevail we'd better be ready to pile more of those ugly hacks in the kernel. Nicolas