Hi Daejun, Sorry I lost the cover letter so I replied this mail instead. For this series, Reviewed-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx> Tested-by: Stanley Chu <stanley.chu@xxxxxxxxxxxx> On Thu, 2021-04-29 at 08:23 +0900, Daejun Park wrote: > This is a patch for the HPB initialization and adds HPB function calls to > UFS core driver. > > NAND flash-based storage devices, including UFS, have mechanisms to > translate logical addresses of IO requests to the corresponding physical > addresses of the flash storage. > In UFS, Logical-address-to-Physical-address (L2P) map data, which is > required to identify the physical address for the requested IOs, can only > be partially stored in SRAM from NAND flash. Due to this partial loading, > accessing the flash address area where the L2P information for that address > is not loaded in the SRAM can result in serious performance degradation. > > The basic concept of HPB is to cache L2P mapping entries in host system > memory so that both physical block address (PBA) and logical block address > (LBA) can be delivered in HPB read command. > The HPB READ command allows to read data faster than a read command in UFS > since it provides the physical address (HPB Entry) of the desired logical > block in addition to its logical address. The UFS device can access the > physical block in NAND directly without searching and uploading L2P mapping > table. This improves read performance because the NAND read operation for > uploading L2P mapping table is removed. > > In HPB initialization, the host checks if the UFS device supports HPB > feature and retrieves related device capabilities. Then, some HPB > parameters are configured in the device. > > We measured the total start-up time of popular applications and observed > the difference by enabling the HPB. > Popular applications are 12 game apps and 24 non-game apps. Each target > applications were launched in order. The cycle consists of running 36 > applications in sequence. We repeated the cycle for observing performance > improvement by L2P mapping cache hit in HPB. > > The Following is experiment environment: > - kernel version: 4.4.0 > - RAM: 8GB > - UFS 2.1 (64GB) > > Result: > +-------+----------+----------+-------+ > | cycle | baseline | with HPB | diff | > +-------+----------+----------+-------+ > | 1 | 272.4 | 264.9 | -7.5 | > | 2 | 250.4 | 248.2 | -2.2 | > | 3 | 226.2 | 215.6 | -10.6 | > | 4 | 230.6 | 214.8 | -15.8 | > | 5 | 232.0 | 218.1 | -13.9 | > | 6 | 231.9 | 212.6 | -19.3 | > +-------+----------+----------+-------+ > > We also measured HPB performance using iozone. > Here is my iozone script: > iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 > -s $IO_RANGE/16 -F mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 > mnt/tmp_6 mnt/tmp_7 mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 > mnt/tmp_13 mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 > > Result: > +----------+--------+---------+ > | IO range | HPB on | HPB off | > +----------+--------+---------+ > | 1 GB | 294.8 | 300.87 | > | 4 GB | 293.51 | 179.35 | > | 8 GB | 294.85 | 162.52 | > | 16 GB | 293.45 | 156.26 | > | 32 GB | 277.4 | 153.25 | > +----------+--------+---------+