On 2/2/24 15:31, Damien Le Moal wrote:
With zone write plugging, each zone of a zoned block device has a
64B struct blk_zone_wplug. While this is not a problem for small
capacity drives with few zones, this structure size result in large
memory usage per device for large capacity block devices.
E.g., for a 28 TB SMR disk with over 104,000 zones of 256 MB, the zone
write plug array of the gendisk uses 6.6 MB of memory.
However, except for the zone write plug spinlock, flags, zone capacity
and zone write pointer offset which all need to be always available
(the later 2 to avoid having to do too many report zones), the remaining
fields of struct blk_zone_wplug are needed only when a zone is being
written to.
This commit introduces struct blk_zone_active_wplug to reduce the size
of struct blk_zone_wplug from 64B down to 16B. This is done using an
union of a pointer to a struct blk_zone_active_wplug and of the zone
write pointer offset and zone capacity, with the zone write plug
spinlock and flags left as the first fields of struct blk_zone_wplug.
The flag BLK_ZONE_WPLUG_ACTIVE is introduced to indicate if the pointer
to struct blk_zone_active_wplug of a zone write plug is valid. For such
case, the write pointer offset and zone capacity fields are accessible
from struct blk_zone_active_wplug. Otherwise, they can be accessed from
struct blk_zone_wplug.
This data structure organization allows tracking the write pointer
offset of zones regardless of the zone write state (active or not).
Handling of zone reset, reset all and finish operations are modified
to update a zone write pointer offset according to its state.
A zone is activated in blk_zone_wplug_handle_write() with a call to
blk_zone_activate_wplug(). Reclaiming of allocated active zone write
plugs is done after a zone becomes full or is reset and
becomes empty. Reclaiming (freeing) of a zone active write plug
structure is done either directly when a plugged BIO completes and the
zone is full, or when resetting or finishing zones. Freeing of active
zone write plug is done using blk_zone_free_active_wplug().
For allocating struct blk_zone_active_wplug, a mempool is created and
sized according to the disk zone resources (maximum number of open zones
and maximum number of active zones). For devices with no zone resource
limits, the default BLK_ZONE_DEFAULT_ACTIVE_WPLUG_NR (128) is used.
With this mechanism, the amount of memory used per block device for zone
write plugs is roughly reduced by a factor of 4. E.g. for a 28 TB SMR
hard disk, memory usage is reduce to about 1.6 MB.
Hmm. Wouldn't it sufficient to tie the number of available plugs to the
number of open zones? Of course that doesn't help for drives not
reporting that, but otherwise?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich