Re: [RFT PATCH v2 05/10] drivers: qcom: rpmh-rsc: A lot of comments

Maulik Shah <mkshah@xxxxxxxxxxxxxx> · Wed, 8 Apr 2020 16:40:15 +0530




Hi,

On 4/3/2020 1:48 AM, Doug Anderson wrote:
Hi,

On Wed, Apr 1, 2020 at 4:30 AM Maulik Shah <mkshah@xxxxxxxxxxxxxx> wrote:
+ * Return: 0 if no problem, or -EAGAIN if the caller should try again in a bit.
+ *         Caller should make sure to enable interrupts between tries.
+ *         Only ever returns -EAGAIN for ACTIVE_ONLY transfers.
with [2] it will not return -EAGAIN, can you please remove this part.
Sounds good.  Overall I'll probably wait to spin until your series
lands because trying to keep spinning this one in conjunction with
that one is going to be a nightmare.  Hopefully that means:

a) Your series can land soon.  I think it's in pretty good shape now.
I just sent a bunch of reviews.  Might need one more spin for nits and
then we'll see if Bjorn thinks it's good to go.

b) Once I spin this it can get a quicker review so other things don't
pop up and change things.

...or, if you want, you can hijack my series and send the next version
out yourself.  I won't object to that but please give me a heads up if
you're planning to do that so we don't duplicate work.


@@ -246,12 +288,35 @@ static struct tcs_group *get_tcs_for_msg(struct rsc_drv *drv,
                       ret = rpmh_rsc_invalidate(drv);
                       if (ret)
                               return ERR_PTR(ret);
+
+                     /*
+                      * TODO:
+                      * - Temporarily enable IRQs on the wake tcs.
+                      * - Make sure nobody else race with us and re-write
+                      *   to this TCS; document how this works.
You can remove above comment, i already included change to enable IRQs
on wake TCS in my series.
Right.  The race comment still is somewhat interesting because the
only way we're race free is that we know that the sleep/wake values
are only programmed at a time when no more active transactions can be
started.  I'll document that assumption.  If we ever change that
assumption we'll have to think about this more since (at the moment)
programming sleep/wake doesn't set the "tcs_in_use" bit.


+/**
+ * check_for_req_inflight() - Look to see if conflicting cmds are in flight.
+ * @drv: The controller.
+ * @tcs: A pointer to the tcs_group used for ACTIVE_ONLY transfers.
+ * @msg: The message we want to send, which will contain several addr/data
+ *       pairs to program (but few enough that they all fit in one TCS).
+ *
+ * Only for use for ACTIVE_ONLY tcs_group, since those are the only ones
+ * that might be actively sending.
This comment will need to modify/removed after we use this in place of
find_match().

see below for more details.
As per below I'm trying to understand the motivation for using
check_for_req_inflight() when writing sleep/wake sets, so changing
this is pending on that discussion.


@@ -411,6 +533,20 @@ static int find_free_tcs(struct tcs_group *tcs)
       return -EBUSY;
   }

+/**
+ * tcs_write() - Store messages into a TCS right now, or return -EBUSY.
+ * @drv: The controller.
+ * @msg: The data to be sent.
+ *
+ * Grabs a TCS for ACTIVE_ONLY transfers and writes the messages to it.
+ *
+ * If there are no free ACTIVE_ONLY TCSs or if a command for the same address
+ * is already transferring returns -EBUSY which means the client should retry
+ * shortly.
+ *
+ * Return: 0 on success, -EBUSY if client should retry, or an error.
+ *         Client should have interrupts enabled for a bit before retrying.
+ */
   static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg)
   {
       struct tcs_group *tcs;
@@ -418,16 +554,14 @@ static int tcs_write(struct rsc_drv *drv, const struct tcs_request *msg)
       unsigned long flags;
       int ret;

+     /* TODO: get_tcs_for_msg() can return -EAGAIN and nobody handles */
with [2] it will not return -EAGAIN, can you please remove this comment.
OK.


       tcs = get_tcs_for_msg(drv, msg);
       if (IS_ERR(tcs))
               return PTR_ERR(tcs);

       spin_lock_irqsave(&tcs->lock, flags);
+
       spin_lock(&drv->lock);
-     /*
-      * The h/w does not like if we send a request to the same address,
-      * when one is already in-flight or being processed.
-      */
please keep above comment as it is.
OK.  I guess I felt like now that check_for_req_inflight() was
documented it was just getting in the way, but I'll keep it if you
want.


@@ -485,6 +635,63 @@ int rpmh_rsc_send_data(struct rsc_drv *drv, const struct tcs_request *msg)
       return ret;
   }

+/**
+ * find_match() - Find if the cmd sequence is in the tcs_group
+ * @tcs: The tcs_group to search.  Either sleep or wake.
+ * @cmd: The command sequence to search for; only addr is looked at.
+ * @len: The number of commands in the sequence.
+ *
+ * Searches through the given tcs_group to see if a given command sequence
+ * is in there.
+ *
+ * Two sequences are matches if they modify the same set of addresses in
+ * the same order.  The value of the data is not considered when deciding if
+ * two things are matches.
+ *
+ * How this function works is best understood by example.  For our example,
+ * we'll imagine our tcs group contains these (cmd, data) tuples:
+ *   [(a, A), (b, B), (c, C), (d, D), (e, E), (f, F), (g, G), (h, H)]
+ * ...in other words it has an element where (addr=a, data=A), etc.
+ * ...we'll assume that there is one TCS in the group that can store 8 commands.
+ *
+ * - find_match([(a, X)]) => 0
+ * - find_match([(c, X), (d, X)]) => 2
+ * - find_match([(c, X), (d, X), (e, X)]) => 2
+ * - find_match([(z, X)]) => -ENODATA
+ * - find_match([(a, X), (y, X)]) => -EINVAL (and warning printed)
+ * - find_match([(g, X), (h, X), (i, X)]) => -EINVAL (and warning printed)
+ * - find_match([(y, X), (a, X)]) => -ENODATA
+ *
+ * NOTE: This function overall seems like it has questionable value.
+ * - It can be used to update a message in the TCS with new data, but I
+ *   don't believe we actually do that--we always fully invalidate and
+ *   re-write everything.
we are doing this from [1] onwards.
OK.


Specifically it would be too limiting to force
+ *   someone not to change the set of addresses written to each time.
+ * - This function could be attempting to avoid writing different data to
+ *   the same address twice in a tcs_group.  If that's the goal, it doesn't
+ *   do a great job since find_match([(y, X), (a, X)]) return -ENODATA in my
+ *   above example.
+ * - If you originally wrote [(a, A), (b, B), (c, C)] and later tried to
+ *   write [(a, A), (b, B)] it'd look like a match and we wouldn't consider
+ *   it an error that the size got shorter.
+ * - If two clients wrote sequences that happened to be placed in slots next
+ *   to each other then a later check could match a sequence that was the
+ *   size of both together.
+ *
+ * TODO: in light of the above, prehaps we can just remove this function?
We still need to check there is no duplicate request in a TCS for SLEEP
and WAKE as well.

find_match() checks if the request already exist for a resource then
update with new value, in a way preventing new request to go in

when one already exists. I am ok to remove this function since after [1]
we always fully invalidate and then re-write and little point in

finding a match now. However we need to use check_for_req_inflight()
from tcs_ctrl_write() with little modification to ignore tcs_is_free()

check if is called from tcs_ctrlr_write().
Sure, we could use find_match() to add this check.  It definitely
feels a lot better than the current solution which seems to miss a
whole lot of corner cases.

Before I do that, maybe you can help me understand the motivation,
though?  Where are you expecting to find the conflict?  Certainly
there can't be any conflict in the normal (non-batch) sleep/wake sets,
right?  We only have one entry in the RPMH cache per address so the
non-batch entries can't conflict with themselves.  There also can't be
any previous async command still pending because we cache
async/non-async alike.
correct, we have unique requests in non-batch caches.

For batch requests I believe that there's exactly one caller using the
batch API (otherwise rpmh_invalidate() would be a nightmare).  That
one caller is the interconnect code, right?  It feels like we could
assume that the one caller of the batch API won't write to the same
address more than one time?
correct there is only one client interconnect using batch API.

So I guess you're expecting that an rpmh_write() would write to the
same address that someone wrote to with rpmh_write_batch() and it
should override it?
On upstream interconnect i see they are using batch only requests till now.

I agree to remove find_match() completly, and we can see in future if 
interconnect driver starts using non-batch APIs as well then we can 
introduce a check to find duplicates.
Does that actually happen?
On upstream interconnect i see they are using batch only requests till now.
I agree to remove find_match() completly, and we can see in future if 
interconnect driver starts using non-batch APIs as well then we can 
introduce a check to find duplicates.
Isn't the batch API
used just for interconnect stuff and nobody should be using
rpmh_write() to mess with the interconnect stuff?


After this change on 9th change in your series,  please move it before
current patch in series.

please also keep dependency on [1] for 9th change.
Sure.  I was trying to do all the documentation in the earlier patches
to provide motivation for and help understand the later patches, but I
can change the order if need be.  It didn't seem a big deal to add the
comments and delete them, but I can understand it being weird.


   /**
- * rpmh_rsc_write_ctrl_data: Write request to the controller
- *
- * @drv: the controller
- * @msg: the data to be written to the controller
+ * rpmh_rsc_write_ctrl_data() - Write request to controller but don't trigger.
+ * @drv: The controller.
+ * @msg: The data to be written to the controller.
    *
    * There is no response returned for writing the request to the controller.
you can remove above line since responce is returned from this function.
Right.  I think this was trying to say that the request wasn't
triggered and thus there was no response.  I think it's clearer with
my addition of "but don't trigger." to the comment.



-Doug

Thanks,

Maulik

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation