Re: The fatal logic error in EC write transaction(EC transaction is done in interrupt context)

Alexey Starikovskiy <astarikovskiy@xxxxxxx> · Sat, 08 Nov 2008 19:47:18 +0300

Yakui,
This is not about my patch, this is about your ego...
My patch did not introduce this behaviour, it was there since "burst mode" was introduced, and may be even earlier. You may ask Luming Yu, why he was so optimistic about 1 microsecond (is he still around? you did not include him into CC yet...).

Regards,
Alex.

Zhao, Yakui wrote:
    a. When the host writes data to the command or data register of the embedded controller, the input buffer flag (IBF) in the status register is set within 1 microsecond.( Please see it in the section 12.7 of ACPI 3.0b. This should be paid attention to).  So If another EC transaction is started in 1 microsecond, maybe the EC status is not updated. In such case the problem will appear.
       At the same time we can't expect that all the EC follows the spec strictly. Maybe on some EC controllers more time is needed.
    b. Why not wait before the EC transaction is finished?  If it fails, it can't be caught in time.  From the programming viewpoint the logic is incorrect. If you insist that viewpoint is correct, IMO this issue can be discussed in the wider range. 

     In my email I raise the issue about your patch. If your patch can work well, no one is willing to raise any issue about your patch. 

-----Original Message-----
From: Alexey Starikovskiy [mailto:astarikovskiy@xxxxxxx] 
Sent: 2008年11月8日 23:34
To: Zhao, Yakui
Cc: LenBrown; linux-acpi@xxxxxxxxxxxxxxx; Brown, Len; Li, Shaohua; Zhang, Rui; Lin, Ming M
Subject: Re: The fatal logic error in EC write transaction(EC transaction is done in interrupt context)

Hi Yakui,

Just don't know how I would have lived without your valuable analisys...
Do you remember that ec_transaction starts by waiting for IBF to become 0?
Thus not any new transaction should begin before previous write or burst-disable command completes.
So, fatal logic error exists only in your head.

Now, let me explain why it is better to wait for last IBF=0 event at beginning of transaction, rather than at the end of previous one.
Look at the flow chart from different angle -- as fast transaction patch does it:
EC sends interrupt to OS then it is ready for us to proceed. It will send us interrupt marking IBF=0 transition if it expects us to write byte and it will send us interrupt marking OBF=1 transition if it expects us to read byte. It does not ever send us IBF=1 or OBF=0 interrupts just because there is nothing we could of should do in such a case.
After we completed write sequence, we don't need to wait for last IBF=0 transition, as we don't know if we will have anything else to write -- it might be last transaction for quite a while. But next transaction knows that it should not start before IBF is cleared, thus it waits 
for it (and expects possible interrupt).

Regards,
Alex.

Zhao, Yakui wrote:
Hi, Alexey
     After checking the EC working flowchart I find a fatal logic error that exists in EC write transaction. The issue happens on most laptops. The following is the detailed explanation about this issue:
     According to the ACPI the EC write transaction is divided into the following three phases: (Seen in the section 12.6.2 of ACPI 3.0b)
     a. Byte #1: Host Writes the EC command to EC CMD I/O PORT. Interrupt is triggered after EC reads the command byte 
     b. Byte #2: Host writes the address index to EC data I/O port. Interrupt is triggered after EC reads the address index byte
     c. Byte #3: Host writes the Data to EC data I/O port. Interrupt is triggered after EC reads the data byte

   When the function of acpi_ec_transaction_unlocked is called for EC write transaction, the wlen variable is initialized as 2 and the rlen variable is initialized as zero. Then EC transaction follows the below three steps.
     1. OS writes the EC command to EC CMD I/O port
     2. EC GPE interrupt is triggered and then the address index is written into EC data I/O port. (The wlen is decreased to 1.This is executed in EC GPE handler)
     3. EC GPE interrupt is triggered again and then the data is written into the EC data I/O port. The wlen is decreased to 0.(This is also executed in EC GPE handler). As the wlen is decreased to zero, it means that the EC transaction is finished(True is returned by the function of ec_transaction_done). 
      >status = acpi_ec_read_status(ec);
      >gpe_transaction(ec, status);
      > if (ec_transaction_done(ec) && (status & ACPI_EC_FLAG_IBF) == 0)
                wake_up(&ec->wait); 
     In this step the status variable only indicates that EC controller already reads the address index written by OS(It can’t indicate that EC controller already reads the data byte written by Host). As the IBF flag is zero, the waiting process is waked up and EC mutex lock is released. In such case it means that OS can begin another EC transaction. 
     But in fact in such case EC transaction is not really finished. After the EC data is written into EC Data I/O port, no flag indicates whether EC transaction is finished. Only when the IBF bit becomes zero again, we can say that EC transaction is really finished. If OS begins another EC transaction before previous EC transaction is really finished, we can't imagine what will happen.
　　IMO Based on the above analysis there exists the fatal logic error in the EC write transaction after the patch from Alexey is shipped. Although now no one reports this issue, it is still an unstable factor. After all it is the fatal logic error. 

　　 At the same time there exists the similar logic error when OS begins an EC transaction to disable the EC burst mode.

　　Welcome the comments.

　　Best regards
　　     Yakui

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html