On Wednesday, September 11, 2013 02:08:44 PM Viresh Kumar wrote: > On 11 September 2013 01:16, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > On Tuesday, September 10, 2013 08:44:18 PM Viresh Kumar wrote: > > >> Now Second question, is this fine to have multiple PRECHANGE notfications > >> before any POSTCHANGE notification? > >> > >> Logically it looks obvious to me that these must be serialized.. > >> Otherwise this is what we are broadcasting: > >> - We are changing for freq X, please prepare and let us know if you are > >> okay with it.. > >> - Oh.. So sorry, we are changing to freq Y instead, please adjust accordingly. > >> > >> - Yes we have changed to freq Y... > >> - Yes we have changed to freq X... > >> > >> This looks stupid, isn't it? I don't know how the drivers would behave when > >> they see such notifications and so got to this patch initially.. > > > > Well, precisely, you don't know. > > > > Can you please look for specific problems and address those instead of trying > > to address potential issues that may or may not happen and may or may not really > > be issues even if they actually happen? > > That looked like a straight forward issue/bug to me and so I haven't > gotten deep into it.. Which you should always do when you're going to deal with concurrency issues. Even if they appear to be obvious, they often are far from that, like in this case. > Scenario 1: > -------------- > Notifiers are not serialized and suppose this is what happened > - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) > - PRECHANGE Notification for freq B (from target()) > - Freq changed by target() to B > - POSTCHANGE Notification for freq B > - POSTCHANGE Notification for freq A > > Now the last POSTCHANGE Notification happened for freq A and > hardware is currently running at freq B :) > > Where would we break then?: adjust_jiffies() in cpufreq.c, > cpufreq_callback() in arch/arm/kernel/smp.c (which is also adjusting > jiffies).. > > Now, all loops_per_jiffy based stuff is broken.. Have I missed > something? > > Similarly there are numerous places where we may break.. As we > have broken expectations of receivers of these notifications.. Now it is clear what the problem is and that should go in the changelog of the fix patch. > Scenario 2: > -------------- > Governor is changing freq and has called __cpufreq_driver_target(). > At the same time we are changing scaling_{min|max}_freq from > sysfs, which would eventually end up calling governors: > CPUFREQ_GOV_LIMITS notification, that will also call: > __cpufreq_driver_target().. > > So, we eventually have two concurrent calls to ->target() and we > don't really know how hardware will behave in this case.. Most of > the implementations of ->target() routines just go and change > freq/voltage without checking if we are already in progress of doing > that (i.e. based on expectation that this call is not re entrant).. > > Now anything can happen at hardware level, which I don't have > all insight of :( That is more theoretical, however. I think that the "Scenario 1" is a good enough reason for making this change, but it needs to be done with care. > > And broken patches that try to fix such things definitely don't help. > > Yeah, that's my fault but I honestly try to produce bug-less patches.. > But being a normal human being, there are always some chances > that my patches are eventually broken :) Well, that happens for everyone, but in this particular case we had three different issues because of a single patch ... And actually I also should have spend more time on this before applying the patch, especially given the laconic changelog, so it's my fault too. > > For now I'm reverting the commit in question and everything on top of it. > > Its okay.. -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html