On Sat, May 28, 2011 at 5:57 PM, Nishanth Menon <nm@xxxxxx> wrote: > On 13:53-20110527, MyungJoo Ham wrote: > [..] >> diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c >> index 56a6899..819c1b3 100644 >> --- a/drivers/base/power/opp.c >> +++ b/drivers/base/power/opp.c >> @@ -21,6 +21,7 @@ >> Â#include <linux/rculist.h> >> Â#include <linux/rcupdate.h> >> Â#include <linux/opp.h> >> +#include <linux/devfreq.h> >> >> Â/* >> Â * Internal data structure organization with the OPP layer library is as >> @@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt) >> Â Â Â list_add_rcu(&new_opp->node, head); >> Â Â Â mutex_unlock(&dev_opp_list_lock); >> >> + Â Â /* >> + Â Â Â* Notify generic dvfs for the change and ignore error >> + Â Â Â* because the device may not have a devfreq entry >> + Â Â Â*/ >> + Â Â devfreq_update(dev); > I think I understand your thought about notifiers here - we had some > initial thought that we may need notifiers, for e.g. clk change > notifiers which could implement enable/disable on a dependent device, > thermal management etc.. so far, we have'nt had a need to modify the > lowest data store layer to handle this. In this case, I think you'd > like a notifier mechanism in general for opp_add,enable or disable in > the opp library. Yes, I wanted a notifier machnaism called by OPP for any potential changes in available frequencies. For that purpose, I've added devfreq_update() because, for now, DEVFREQ is the only one that needs such a notifier and a notifier has some overhead. If there are multiple entities that require notifications from OPP for such changes, we'd need notifiers maintained by OPP. > However, with this change, an opp_add now means: > a) opp_add now depends on devfreq_update() to be part of it's integral > operation - I think the devfreq should maintain it's own notifier > mechanisms and not make OPP library do the notification trigger job of > devfreq. To let the entities depending on the list of available frequencies be notified, I think it would be appropriate for OPP to have a notifier chain and call the notifier chain whenever there is a change in the list. In that way, we can always add and remove the depending entities (including DEVFREQ) to and from OPP's. Otherwise, we'd need OPP users that call OPP functions to call the notifier chain when it appears that the call will change the list. As long as the updates in available frequencies affect other entities, OPP library should be able to notify the others whenever such changes occur. In this patch, I've used devfreq_update() directly because we do not have other entities, yet. However, as you've mentioned, there will be others, so I'm willing to implement notifier in OPP if other people also agree. > b) By ignoring the error, how will the caller of opp_add now know if the > failures were real - e.g. some device failed to act on the notification > and the overall opp_add operation failed? And why should OPP library do > recovery for on behalf of devfreq? Um... OPP does not need to do any recovery for errors from DEVFREQ. Other than errors from find_device_devfreq() in devfreq_update(), the errors seem to be better returned somehow to the caller of opp_add(). However, would it be fine to let opp_add (and other frequency-availability changing functions) return errors returned from a notifier? (either devfreq_update() or srcu_notifier_call_chain()) > c) How will you ensure the lock accuracy - given that you need to keep > to the constraints of opp_add/opp_set_availability here? For example: > devfreq_do is called from devfreq_monitor(workqueue) and devfreq_update > in addition to having to call a bunch of function pointers whose > implementation you do not know, having a function that ensures security > of it's data, handles all lock conditions that is imposed differently by > add,enable and disable is not visible in this implementation. devfreq_update() waits for devfreq_monitor() with devfreq_list_lock. Both has devfreq_list_lock locked during their operations. > c) How about considering the usage of OPP library with an SoC cpufreq > implementation say for MPU AND using the same SoC using devfreq for > controlling some devices like MALI in your example simultaneously? Lets > say the thermal policy manager implementation for some reason did an > opp_disable of a higher OPP of MPU - devfreq_update gets called with the > device(mpu_dev).. devfreq_update needs to now invoke it's validation > path and fails correctly - Vs devfreq_update being called by some module > which actually uses devfreq and the update failed - How does one > differentiate between the two? When OPP is being used by CPUFREQ w/ CPU and by MALI, the OPP list should be attached to the two devices independently. In other words, the frequency of CPU should be controlled independently from the frequency of OPP. If both CPU and MALI should use the same frequency, there is no point to implement devfreq for MALI. However, if the condition is not fully-dependent; e.g., Freq(MALI) <= Freq(CPU), we can implement two instances of OPP for each of CPU and MALI and let the "target" function supplied to CPU to run opp_enable/disable and devfreq_update() to MALI before adjusting the frequency of itself. When we are implementing another entity that controls OPP/DEVFREQ such as the thermal policy manager, it should tackle both CPU and MALI w/ OPPs at the same time if they are fully independent (i.e., the frequency of MALI is not affected by the frequency of CPU). If not, the same things (discussed in the previous paragraph) are applied. In the two examples you've mentioned, the first one is about the implementation of the current patch (let thermal manager handle OPP and OPP handle devfreq_update) and the another is when DEVFREQ calls are implemented through thermal manager (let thermal manager handle both OPP and devfreq_update), right? For the two cases, I think that the first approach is more appropriate. We will probably have more than just a thermal manager to handle OPP (and devfreq that depends on OPP). The first approach allows users to declare which frequencies are available and to let the underlying structure do their work without intervention from users (those who decide which frequencies are available). With the second approach, we need to enforce every frequency affecting entity to understand underlying frequency dependencies and the behavior of devfreq, not jst to understand the behavior of the device according to the frequency. DEVFREQ is going to be handled and implemented by the device driver of the device controlled by the instance of DEVFREQ. By using the first approach, we can prevent affecting entities other than the device driver using DEVFREQ from thinking about DEVFREQ of that device. > > just my 2 cents, I think these kind of notifiers should probably > stay out of opp library OR notifiers be introduced in a generic and safe > fashion. So, you want either to call devfreq_update from where opp_add/enable/disable/... are called. or to use notifier in OPP? Well, the first one would become the source of messy problems as mentioned earlier. The second one has some overhead, but it's a general approach and does not incur the issues of the first approach. Therefore, I prefer either to keep the initial approach regarding devfreq_update() or to use a generic notifier at OPP (probably, SRCU notifier chains?). How about using a SRCU_NOTIFIER at OPP? The notifier.h says that SRCU has low overhead generally and the most overhead of SRCU comes from "remove srcu" (probably when "device" is being removed), which won't happen a lot with OPP. > >> Â Â Â return 0; >> Â} >> >> @@ -512,6 +518,9 @@ unlock: >> Â Â Â mutex_unlock(&dev_opp_list_lock); >> Âout: >> Â Â Â kfree(new_opp); >> + >> + Â Â /* Notify generic dvfs for the change and ignore error */ >> + Â Â devfreq_update(dev); > Another place where I am confused - how does devfreq_update know in what > context is it being called is the OPP getting enabled/disabled/added - > devfreq_update has to do some additional work to make it's own decision > - it is almost as if it uses devfreq_update as a trigger mechanism to reevalute > the entire decision tree - without using information of the context > opp_enable/disable/add has to maybe make better call. if this > information is not needed, maybe some other mechanism implemented by > devfreq layer might suffice.. No, DEVFREQ does not require the context (OPP being enabled/disabled/...) with the devfreq_update() calls. It is essentially a "notify devfreq to reevaluate" and DEVFREQ reevaluates based on the available frequencies. Thus, a plain srcu_notifier_call_chain() should work for DEVFREQ/devfreq_update from OPP. > >> Â Â Â return r; >> Â} > > [...] > -- > Regards, > Nishanth Menon > Thank you for your comments, Nishanth. Cheers! - MyungJoo -- MyungJoo Ham (íëì), Ph.D. Mobile Software Platform Lab, Digital Media and Communications (DMC) Business Samsung Electronics cell: 82-10-6714-2858 _______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/linux-pm