On Fri, Sep 24, 2010 at 04:26:21PM -0500, Nishanth Menon wrote: > Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following: > [...] > > > >Looks like a good start!!! Some questions and suggestions about RCU > Thanks for the review.. few comments below.. Back at you! ;-) > >usage interspersed below. > > > > Thanx, Paul > [...] > >>+ > >>+/** > >>+ * find_device_opp() - find device_opp struct using device pointer > >>+ * @dev: device pointer used to lookup device OPPs > >>+ * > >>+ * Search list of device OPPs for one containing matching device. Does a RCU > >>+ * reader operation to grab the pointer needed. > >>+ * > >>+ * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or > >>+ * -EINVAL based on type of error. > >>+ */ > >>+static struct device_opp *find_device_opp(struct device *dev) > >>+{ > >>+ struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV); > >>+ > >>+ if (unlikely(!dev || IS_ERR(dev))) { > >>+ pr_err("%s: Invalid parameters being passed\n", __func__); > >>+ return ERR_PTR(-EINVAL); > >>+ } > >>+ > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) { > >>+ if (tmp_dev_opp->dev == dev) { > >>+ dev_opp = tmp_dev_opp; > >>+ break; > >>+ } > >>+ } > >>+ rcu_read_unlock(); > > > >What prevents the structure pointed to by dev_opp from being freed > >at this point? We are no longer in an RCU read-side critical section, > >so RCU grace periods starting during the above RCU read-side critical > >section can now end. > > dev_opp is never freed in the implementation -> it represents > domains, only adds with list_add_rcu() is done -> wont the usage be > safe then? or I being blind? > > the reason why we dont free is coz of the following: dev_opp > represents voltage domains in opp library. SoC frameworks are > required to register only those voltage domain opp that are > required. by allowing a free logic, I knew it'd have complicated the > implementation way beyond what we needed it to be. Perhaps I was confusing two different data structures, if so, apologies. So you are freeing the opp level, but never the dev_opp level, then? But yes, if you are only adding and never deleting, then it is safe to pass the pointers out of an RCU read-side critical section. But please add a comment saying why you are doing this. Otherwise, Coccinelle will cause me to continue complaining about this to you. ;-) And the later uses still look buggy to me, please see below. > >Here is an example sequence of events that I am worried about: > > > >o CPU 1 enters find_device_opp(), and pick up a pointer to > > a given device opp. > > > >o CPU 2 executes opp_set_availability(), replacing that same > > device opp with a new one. It then calls synchronize_rcu() > > which blocks waiting for CPU 1 to exit its RCU read-side > > critical section. > > > >o CPU 1 exits its RCU read-side critical section, arriving at > > this point in the code. > > > >o CPU 2's synchronize_rcu() is now permitted to return, executing > > the kfree(), which frees up the memory that CPU 1's dev_opp > > pointer references. > > > >o This newly freed memory is allocated for some other structure > > by CPU 3. CPU 1 and CPU 3 are now trying to use the same > > memory for two different structures, and nothing good can > > possibly come of this. The kernel dies a brutal and nasty > > death. > > > >One way to fix this is to have the caller do rcu_read_lock() before > >calling find_device_opp(), and to do rcu_read_unlock() only after the > >caller has finished using the pointer that find_device_opp() returns. > >This works well unless the caller needs to do some blocking operation > >before it gets done using the pointer. > > > >Another approach is for find_device_opp() to use a reference count on > >the structure, and for opp_set_availability() to avoid freeing the > >structure unless/until the reference counter drops to zero. > > > >There are other approaches as well, please feel free to take a look > >at Documentation/RCU/rcuref.txt for more info on using reference > >counting and RCU. > thx. I probably should read yet again if I got my understanding of > usage right.. > > > > [...] > >>+ > >>+/** > >>+ * opp_find_freq_exact() - search for an exact frequency > >>+ * @dev: device for which we do this operation > >>+ * @freq: frequency to search for > >>+ * @is_available: true/false - match for available opp > >>+ * > >>+ * Searches for exact match in the opp list and returns pointer to the matching > >>+ * opp if found, else returns ERR_PTR in case of error and should be handled > >>+ * using IS_ERR. > >>+ * > >>+ * Note: available is a modifier for the search. if available=true, then the > >>+ * match is for exact matching frequency and is available in the stored OPP > >>+ * table. if false, the match is for exact frequency which is not available. > >>+ * > >>+ * This provides a mechanism to enable an opp which is not available currently > >>+ * or the opposite as well. > >>+ * > >>+ * Locking: RCU reader. > >>+ */ > >>+struct opp *opp_find_freq_exact(struct device *dev, > >>+ unsigned long freq, bool available) > >>+{ > >>+ struct device_opp *dev_opp; > >>+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV); > >>+ > >>+ dev_opp = find_device_opp(dev); > >>+ if (IS_ERR(dev_opp)) > >>+ return opp; > >>+ > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) { > >>+ if (temp_opp->available == available && > >>+ temp_opp->rate == freq) { > >>+ opp = temp_opp; > >>+ break; > >>+ } > >>+ } > >>+ rcu_read_unlock(); > > > >But this one sadly has the same problem that find_device_opp() does. > is the concern about opp OR about dev_opp here? I am guessing opp.. > > > >>+ return opp; > >>+} > >>+ > >>+/** > >>+ * opp_find_freq_ceil() - Search for an rounded ceil freq > >>+ * @dev: device for which we do this operation > >>+ * @freq: Start frequency > >>+ * > >>+ * Search for the matching ceil *available* OPP from a starting freq > >>+ * for a device. > >>+ * > >>+ * Returns matching *opp and refreshes *freq accordingly, else returns > >>+ * ERR_PTR in case of error and should be handled using IS_ERR. > >>+ * > >>+ * Locking: RCU reader. > >>+ */ > >>+struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq) > >>+{ > >>+ struct device_opp *dev_opp; > >>+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV); > >>+ > >>+ if (!dev || !freq) { > >>+ pr_err("%s: invalid param dev=%p freq=%p\n", __func__, > >>+ dev, freq); > >>+ return ERR_PTR(-EINVAL); > >>+ } > >>+ > >>+ dev_opp = find_device_opp(dev); > >>+ if (IS_ERR(dev_opp)) > >>+ return opp; > >>+ > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) { > >>+ if (temp_opp->available && temp_opp->rate >= *freq) { > >>+ opp = temp_opp; > >>+ *freq = opp->rate; > >>+ break; > >>+ } > >>+ } > >>+ rcu_read_unlock(); > > > >And this one also has the same problem that find_device_opp() does. > > guessing opp ptr here.. am I right? if it is about device_opp, it is > not going to be freed as I mentioned above - at least we dont give > an function to update(hence free) it. It really does look to me that you are passing a pointer to the thing being freed out of an RCU read-side critical section. If you are really doing this, you do need to do something to fix it. I outlined some of the options earlier. > >>+ return opp; > >>+} > >>+ > >>+/** > >>+ * opp_find_freq_floor() - Search for a rounded floor freq > >>+ * @dev: device for which we do this operation > >>+ * @freq: Start frequency > >>+ * > >>+ * Search for the matching floor *available* OPP from a starting freq > >>+ * for a device. > >>+ * > >>+ * Returns matching *opp and refreshes *freq accordingly, else returns > >>+ * ERR_PTR in case of error and should be handled using IS_ERR. > >>+ * > >>+ * Locking: RCU reader. > >>+ */ > >>+struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq) > >>+{ > >>+ struct device_opp *dev_opp; > >>+ struct opp *temp_opp, *opp = ERR_PTR(-ENODEV); > >>+ > >>+ if (!dev || !freq) { > >>+ pr_err("%s: invalid param dev=%p freq=%p\n", __func__, > >>+ dev, freq); > >>+ return ERR_PTR(-EINVAL); > >>+ } > >>+ > >>+ dev_opp = find_device_opp(dev); > >>+ if (IS_ERR(dev_opp)) > >>+ return opp; > >>+ > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) { > >>+ if (temp_opp->available) { > >>+ /* go to the next node, before choosing prev */ > >>+ if (temp_opp->rate > *freq) > >>+ break; > >>+ else > >>+ opp = temp_opp; > >>+ } > >>+ } > >>+ if (!IS_ERR(opp)) > >>+ *freq = opp->rate; > >>+ rcu_read_unlock(); > > > >As does this one. > > guessing opp ptr here.. am I right? Again, here it looks to me like you are passing a pointer out of an RCU read-side critical section that could be freed out from under you. If so, again, this must be fixed. > >>+ return opp; > >>+} > >>+ > >>+/** > >>+ * opp_add() - Add an OPP table from a table definitions > >>+ * @dev: device for which we do this operation > >>+ * @freq: Frequency in Hz for this OPP > >>+ * @u_volt: Voltage in uVolts for this OPP > >>+ * > >>+ * This function adds an opp definition to the opp list and returns status. > >>+ * The opp is made available by default and it can be controlled using > >>+ * opp_enable/disable functions. > >>+ * > >>+ * Locking: RCU, mutex > >>+ */ > >>+int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt) > >>+{ > >>+ struct device_opp *tmp_dev_opp, *dev_opp = NULL; > >>+ struct opp *opp, *new_opp; > >>+ struct list_head *head; > >>+ > >>+ /* Check for existing list for 'dev' */ > >>+ rcu_read_lock(); > >>+ list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) { > >>+ if (dev == tmp_dev_opp->dev) { > >>+ dev_opp = tmp_dev_opp; > >>+ break; > >>+ } > >>+ } > >>+ rcu_read_unlock(); > >>+ > >>+ /* allocate new OPP node */ > >>+ new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL); > >>+ if (!new_opp) { > >>+ pr_warning("%s: unable to allocate new opp node\n", > >>+ __func__); > >>+ return -ENOMEM; > >>+ } > >>+ > >>+ if (!dev_opp) { > >>+ /* Allocate a new device OPP table */ > >>+ dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL); > >>+ if (!dev_opp) { > >>+ kfree(new_opp); > >>+ pr_warning("%s: unable to allocate device structure\n", > >>+ __func__); > >>+ return -ENOMEM; > >>+ } > >>+ > >>+ dev_opp->dev = dev; > >>+ INIT_LIST_HEAD(&dev_opp->opp_list); > >>+ mutex_init(&dev_opp->lock); > >>+ > >>+ /* Secure the device list modification */ > >>+ mutex_lock(&dev_opp_list_lock); > >>+ list_add_rcu(&dev_opp->node, &dev_opp_list); > >>+ mutex_unlock(&dev_opp_list_lock); > >>+ synchronize_rcu(); > > > >You do not need to wait for an RCU grace period when adding objects, only > >between removing them and freeing them. > > ouch.. my bad.. thx.. will fix > > > > >>+ } > >>+ > >>+ /* populate the opp table */ > >>+ new_opp->dev_opp = dev_opp; > >>+ new_opp->rate = freq; > >>+ new_opp->u_volt = u_volt; > >>+ new_opp->available = true; > >>+ > >>+ /* make the dev_opp modification safe */ > >>+ mutex_lock(&dev_opp->lock); > >>+ > >>+ rcu_read_lock(); > >>+ /* Insert new OPP in order of increasing frequency */ > >>+ head = &dev_opp->opp_list; > >>+ list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) { > >>+ if (new_opp->rate < opp->rate) > >>+ break; > >>+ else > >>+ head = &opp->node; > >>+ } > >>+ rcu_read_unlock(); > >>+ > >>+ list_add_rcu(&new_opp->node, head); > >>+ mutex_unlock(&dev_opp->lock); > >>+ synchronize_rcu(); > > > >Ditto. > thx.. will fix. > > > > >>+ return 0; > >>+} > >>+ > >>+/** > >>+ * opp_set_availability() - helper to set the availability of an opp > >>+ * @opp: Pointer to opp > >>+ * @availability_req: availability status requested for this opp > >>+ * > >>+ * Set the availability of an OPP with an RCU operation, opp_{enable,disable} > >>+ * share a common logic which is isolated here. > >>+ * > >>+ * Returns -EINVAL for bad pointers, -ENOMEM if no memory available for the > >>+ * copy operation, returns 0 if no modifcation was done OR modification was > >>+ * successful. > >>+ */ > >>+static int opp_set_availability(struct opp *opp, bool availability_req) > >>+{ > >>+ struct opp *new_opp, *tmp_opp; > >>+ bool is_available; > >>+ > >>+ if (unlikely(!opp || IS_ERR(opp))) { > >>+ pr_err("%s: Invalid parameters being passed\n", __func__); > >>+ return -EINVAL; > >>+ } > >>+ > >>+ new_opp = kmalloc(sizeof(struct opp), GFP_KERNEL); > >>+ if (!new_opp) { > >>+ pr_warning("%s: unable to allocate opp\n", __func__); > >>+ return -ENOMEM; > >>+ } > >>+ > >>+ mutex_lock(&opp->dev_opp->lock); > >>+ > >>+ rcu_read_lock(); > >>+ tmp_opp = rcu_dereference(opp); > >>+ is_available = tmp_opp->available; > >>+ rcu_read_unlock(); > >>+ > >>+ /* Is update really needed? */ > >>+ if (is_available == availability_req) { > >>+ mutex_unlock(&opp->dev_opp->lock); > >>+ kfree(tmp_opp); > >>+ return 0; > >>+ } > >>+ > >>+ *new_opp = *opp; > >>+ new_opp->available = availability_req; > >>+ list_replace_rcu(&opp->node, &new_opp->node); > >>+ > >>+ mutex_unlock(&opp->dev_opp->lock); > >>+ synchronize_rcu(); > > > >If you decide to rely on reference counts to fix the problem in > >find_device_opp(), you will need to check the reference counts here. > >Again, please see Documentation/RCU/rcuref.txt. > > Does the original point about not needing to free dev_opp resolve this? For the dev_opp case, yes. However, I believe that my point is still valid for the opp case. > >>+ kfree(opp); > >>+ > >>+ return 0; > >>+} > >>+ > >>+/** > >>+ * opp_enable() - Enable a specific OPP > >>+ * @opp: Pointer to opp > >>+ * > >>+ * Enables a provided opp. If the operation is valid, this returns 0, else the > >>+ * corresponding error value. It is meant to be used for users an OPP available > >>+ * after being temporarily made unavailable with opp_disable. > >>+ * > >>+ * Locking: RCU, mutex > > > >By "Locking: RCU", you presumably don't mean that the caller must do > >an rcu_read_lock() -- this would result in a synchronize_rcu() being > >invoked in an RCU read-side critical section, which is illegal. > aye..thx. I will make it more verbose. Does the following sound right? > > Locking used internally: RCU copy-update and read_lock used, mutex > > and for the readers: > > Locking used internally: RCU read_lock used > > or do we need to go all verbatim about the implementation here? > > I intended the user to know the context in which they can call it, > for example, since mutex is used, dont think of using this in > interrupt context. since read_locks are already used, dont need to > double lock it.. opp library takes care of it's own exclusivity. I would stick to the constraints on the caller, and describe the internals elsewhere, for example, near the data-structure definitions. But tastes do vary on this. > >>+ */ > >>+int opp_enable(struct opp *opp) > >>+{ > >>+ return opp_set_availability(opp, true); > >>+} > >>+ > >>+/** > >>+ * opp_disable() - Disable a specific OPP > >>+ * @opp: Pointer to opp > >>+ * > >>+ * Disables a provided opp. If the operation is valid, this returns > >>+ * 0, else the corresponding error value. It is meant to be a temporary > >>+ * control by users to make this OPP not available until the circumstances are > >>+ * right to make it available again (with a call to opp_enable). > >>+ * > >>+ * Locking: RCU, mutex > > > >Ditto. (And similar feedback applies elsewhere.) > > > >>+ */ > >>+int opp_disable(struct opp *opp) > >>+{ > >>+ return opp_set_availability(opp, false); > >>+} > >>+ > >>+#ifdef CONFIG_CPU_FREQ > >>+/** > >>+ * opp_init_cpufreq_table() - create a cpufreq table for a device > >>+ * @dev: device for which we do this operation > >>+ * @table: Cpufreq table returned back to caller > >>+ * > >>+ * Generate a cpufreq table for a provided device- this assumes that the > >>+ * opp list is already initialized and ready for usage. > >>+ * > >>+ * This function allocates required memory for the cpufreq table. It is > >>+ * expected that the caller does the required maintenance such as freeing > >>+ * the table as required. > >>+ * > >>+ * WARNING: It is important for the callers to ensure refreshing their copy of > >>+ * the table if any of the mentioned functions have been invoked in the interim. > >>+ * > >>+ * Locking: RCU reader > >>+ */ > >>+void opp_init_cpufreq_table(struct device *dev, > >>+ struct cpufreq_frequency_table **table) > >>+{ > >>+ struct device_opp *dev_opp; > >>+ struct opp *opp; > >>+ struct cpufreq_frequency_table *freq_table; > >>+ int i = 0; > >>+ > >>+ dev_opp = find_device_opp(dev); > >>+ if (IS_ERR(dev_opp)) { > >>+ pr_warning("%s: unable to find device\n", __func__); > >>+ return; > >>+ } > >>+ > >>+ freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) * > >>+ (opp_get_opp_count(dev) + 1), GFP_ATOMIC); > >>+ if (!freq_table) { > >>+ pr_warning("%s: failed to allocate frequency table\n", > >>+ __func__); > >>+ return; > > > >How does the caller tell that the allocation failed? Should the caller > >set the pointer passed in through the "table" argument to NULL before > >calling this function? Or should this function set *table to NULL > >before returning in this case? > > Good catch. Thanks. I would rather change the return to int and pass > proper errors to caller so that they can handle it appropriately. Works for me! Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html