On 06/28/2017 07:44 AM, Paolo Valente wrote: > >> Il giorno 28 giu 2017, alle ore 14:42, Jens Axboe <axboe@xxxxxxxxx> ha scritto: >> >> On 06/27/2017 11:39 PM, Paolo Valente wrote: >>> >>>> Il giorno 27 giu 2017, alle ore 20:29, Jens Axboe <axboe@xxxxxxxxx> ha scritto: >>>> >>>> On 06/27/2017 12:27 PM, Paolo Valente wrote: >>>>> >>>>>> Il giorno 27 giu 2017, alle ore 16:41, Jens Axboe <axboe@xxxxxxxxx> ha scritto: >>>>>> >>>>>> On 06/27/2017 12:09 AM, Paolo Valente wrote: >>>>>>> >>>>>>>> Il giorno 19 giu 2017, alle ore 13:43, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto: >>>>>>>> >>>>>>>> This commit fixes a bug triggered by a non-trivial sequence of >>>>>>>> events. These events are briefly described in the next two >>>>>>>> paragraphs. The impatiens, or those who are familiar with queue >>>>>>>> merging and splitting, can jump directly to the last paragraph. >>>>>>>> >>>>>>>> On each I/O-request arrival for a shared bfq_queue, i.e., for a >>>>>>>> bfq_queue that is the result of the merge of two or more bfq_queues, >>>>>>>> BFQ checks whether the shared bfq_queue has become seeky (i.e., if too >>>>>>>> many random I/O requests have arrived for the bfq_queue; if the device >>>>>>>> is non rotational, then random requests must be also small for the >>>>>>>> bfq_queue to be tagged as seeky). If the shared bfq_queue is actually >>>>>>>> detected as seeky, then a split occurs: the bfq I/O context of the >>>>>>>> process that has issued the request is redirected from the shared >>>>>>>> bfq_queue to a new non-shared bfq_queue. As a degenerate case, if the >>>>>>>> shared bfq_queue actually happens to be shared only by one process >>>>>>>> (because of previous splits), then no new bfq_queue is created: the >>>>>>>> state of the shared bfq_queue is just changed from shared to non >>>>>>>> shared. >>>>>>>> >>>>>>>> Regardless of whether a brand new non-shared bfq_queue is created, or >>>>>>>> the pre-existing shared bfq_queue is just turned into a non-shared >>>>>>>> bfq_queue, several parameters of the non-shared bfq_queue are set >>>>>>>> (restored) to the original values they had when the bfq_queue >>>>>>>> associated with the bfq I/O context of the process (that has just >>>>>>>> issued an I/O request) was merged with the shared bfq_queue. One of >>>>>>>> these parameters is the weight-raising state. >>>>>>>> >>>>>>>> If, on the split of a shared bfq_queue, >>>>>>>> 1) a pre-existing shared bfq_queue is turned into a non-shared >>>>>>>> bfq_queue; >>>>>>>> 2) the previously shared bfq_queue happens to be busy; >>>>>>>> 3) the weight-raising state of the previously shared bfq_queue happens >>>>>>>> to change; >>>>>>>> the number of weight-raised busy queues changes. The field >>>>>>>> wr_busy_queues must then be updated accordingly, but such an update >>>>>>>> was missing. This commit adds the missing update. >>>>>>>> >>>>>>> >>>>>>> Hi Jens, >>>>>>> any idea of the possible fate of this fix? >>>>>> >>>>>> I sort of missed this one. It looks trivial enough for 4.12, or we >>>>>> can defer until 4.13. What do you think? >>>>>> >>>>> >>>>> It should actually be something trivial, and hopefully correct, >>>>> because a further throughput improvement (for BFQ), which depends on >>>>> this fix, is now working properly, and we didn't see any regression so >>>>> far. In addition, since this improvement is virtually ready for >>>>> submission, further steps may be probably easier if this fix gets in >>>>> sooner (whatever the luck of the improvement will be). >>>> >>>> OK, let's queue it up for 4.13 then. >>>> >>> >>> My arguments was in favor of 4.12 actually. Maybe you did mean 4.12 >>> here? >> >> You were talking about further improvements and new development on top >> of this, so I assumed you meant 4.13. However, further development is >> not the main criteria or concern for whether this fix should go into >> 4.12 or not. > > Ok, thanks for your explanation and patience. > >> The main concern is if this fixes something that is crucial >> to have in 4.12. It's late in the cycle, I'd rather not push anything >> that isn't a regression fix at this point. >> > > Hard to assess precisely how crucial this is. Certainly it fixes a > regression. The practical, negative effects of this regression are > systematic when one tries to add the throughput improvement I > mentioned: the improvement almost never works. If BFQ is used as it > is, then negative effects on throughput are less likely to happen. > > I hope that this piece of information is somehow useful for your > decision. If it's only really visible with the other change on top, then I think we should defer to 4.13. It's not a kernel regression in the most clinical sense, since BFQ wasn't available in 4.12. -- Jens Axboe