Hi Thomas, > > Hi Kautuk, > > could you maybe do some performance checks to see whether this make a > difference (e.g. by running the command in a tight loop many times)? > You can use "tb@" to get the current value of the timebase counter, so > reading that before and after the loop should provide you with a way of > measuring the required time. > > Thomas > This patch is to improve compilation timings of the IF/AHEAD/THEN/CASE/ENDCASE/BEGIN/AGAIN/UNTIL/DO/?DO/LOOP/+LOOP/ Forth words that are NOT within any Forth procedure. And it does this in the same way for all of these Forth words because all of these words simply utilize the +COMP and -COMP words. I created a patch on top of this patch file that introduces the older implementation of IF and THEN and I called them IF2 and THEN2 as follows: col(+COMP-BEFORE STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL) col(-COMP-BEFORE -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE SEMICOLON THERE @ DOTO HERE COMP-BUFFER EXECUTE) imm(IF2 +COMP-BEFORE DOTICK DO0BRANCH COMPILE, HERE 0 COMPILE,) imm(THEN2 ?COMP RESOLVE-ORIG -COMP-BEFORE) The IF2 and THEN2 use -COMP-BEFORE and +COMP-BEFORE in order to have the changes before I applied my "[PATCH v2] slof/engine.in: refine +COMP and -COMP by not using" patch file. Now that I have both implementation, I used the timebase in order to test what is the difference in timebase before and after invocation of numerous IF-THEN and IF2-THEN2 Forth words. I made the following changes to ./board-qemu/slof/OF.fs: diff --git a/board-qemu/slof/OF.fs b/board-qemu/slof/OF.fs index df33c80..56805fc 100644 --- a/board-qemu/slof/OF.fs +++ b/board-qemu/slof/OF.fs @@ -22,6 +22,7 @@ hex #include "base.fs" + \ Set default load-base to 0x4000 4000 to default-load-base @@ -329,6 +330,151 @@ check-boot-from-ram 8ff cp +." BEFORE-PATCH: BEFORE TB is: " tb@ . +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +1 IF2 0 drop THEN2 +cr ." BEFORE-PATCH: AFTER TB is: " tb@ . cr + +." AFTER-PATCH: BEFORE TB is: " tb@ . +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr + +." AFTER-PATCH: BEFORE TB is: " tb@ . +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +1 IF 0 drop THEN +cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr With the above changes in slof/engine.in and board-qemu/slof/OF.fs I complied SLOF and got the following output on running a guest: [root@r223l performance_work]# virsh start vm4 --console Domain 'vm4' started Connected to domain 'vm4' Escape character is ^] (Ctrl + ]) Populating /vdevice methods Populating /vdevice/vty@30000000 Populating /vdevice/nvram@71000000 Populating /pci@800000020000000 00 0800 (D) : 1b36 000d serial bus [ usb-xhci ] 00 1000 (D) : 1af4 1003 virtio [ serial ] 00 1800 (D) : 1af4 1001 virtio [ block ] 00 2000 (D) : 1af4 1002 legacy-device* 00 2800 (D) : 1234 1111 qemu vga No NVRAM common partition, re-initializing... Installing QEMU fb Scanning USB XHCI: Initializing USB Keyboard No console specified using screen & keyboard BEFORE-PATCH: BEFORE TB is: 9de978a1 BEFORE-PATCH: AFTER TB is: 9e78efba AFTER-PATCH: BEFORE TB is: 9ebb67aa AFTER-PATCH: AFTER TB is: 9f2247cc AFTER-PATCH: BEFORE TB is: 9f64b9fd AFTER-PATCH: AFTER TB is: 9fc33e6c Welcome to Open Firmware Copyright (c) 2004, 2017 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@800000020000000/scsi@3 ... Successfully loaded [root@r223l performance_work]# echo $((0x9e78efba-0x9de978a1)) 9402137 [root@r223l performance_work]# echo $((0x9f2247cc-0x9ebb67aa)) 6742050 [root@r223l performance_work]# echo $((0x9fc33e6c-0x9f64b9fd)) 6194287 [root@r223l performance_work]# echo "scale=4;(9402137-6742050)/512" | bc 5195.4824 [root@r223l performance_work]# echo "scale=4;(9402137-6194287)/512" | bc 6265.3320 [root@r223l performance_work]# As per the calculations in the output of the BEFORE-PATCH and AFTER-PATCH logs I find that there is a very noticeable and consistent improvement in multiple runs in terms of microseconds. (My POWER9 bare-metal has 512 MHz timebase-frequency so thats why I am dividing by 512). Note: The above figures include the execution speed of IF-THEN and IF2-THEN2 after compilation. But since the actual execution speeds of the IF-THEN and the IF2-THEN2 after their compilation should be the same, this should get adjusted in my subtraction in the above 2 bc commands. >