Re: block size confusion -- in writing a basic simplest file system

Greg Freemyer <greg.freemyer@xxxxxxxxx> · Fri, 11 Sep 2009 09:28:13 -0400

On Fri, Sep 11, 2009 at 6:54 AM, nidhi mittal hada
<nidhimittal19@xxxxxxxxx> wrote:
> I was learning writing basic filesystem step by step -- till now what i
> wrote just mounts .
> Now
> can someone help me to clarify the difference between
>
> 1)blocksize we give when we do 'dd if=/dev/zero of=nnn bs=4096 count=10
> ans: in my view -- just to define size of file 4096 *10

Others covered this, but experiment with timing various blocksizes.  I
have found you want userspace calls to perfectly align with filesystem
pages for optimum speed.  Worst case should be bs=1.  The kernel has
to read in a 4k block, modify it, write it back to disk.

Then for the next 1 byte "block" repeat.  The caches will help the
kernel, but you should see horrible performance.

Ignoring performance, there are 2 important aspects to dd's blocksize.

1) When working with some tapes, the dd blocksize actually gets
reflected on the tape, so it is critical to write and read with the
same blocksize.  Tar for instance defaults to a 32K blocksize on tape
I believe.

2) If dd has a read error and you have conv=noerror,sync specified as
a dd argument, it will zero fill the rest of the block after the io
error.  Having dd work with large blocks of a MB or so can cause very
unnecessary holes in your data if you are trying to preserve a failing
disk as an example.

> 2)block size we give wen we do  ./mkmyfs nnn 4096
> ans while writing -- filesystem information -- to file -- this block size is
> used

I believe, this is the minimum size data segment on disk the OS will
allocate and track.

Prior to the new topology patches, it was also the smallest data
segment of data read or written to disk.  It has nothing to do with
dd's blocksize.

> 3)block size we have as
> #define MYFS_DEFAULT_BS  which we set as sb->s_blocksize-- in fill_super
> function -- before doing sb_bread of disk super block
>   while mounting -- while reading filesystem info from the file-- this
> blocksize is used
>
> 4)in testfs  ---  sb_min_blocksize() was used  --- before sb_bread in fill
> super --
> wherein minimum of the two
> 'MYFS_DEFAULT_BS'   and    bdev_hardsect_size(sb->s_bdev)
> is set as sb->s_blocksize
> what is bdev_hardsect_size ??
> what's d logic behind using minimum of these two Please CMIIW

hardsectors are part of the new topology patchset that went into
2.6.31.  So I gather you are looking at the latest source.  Anyway,
historically logical sectors and physical sectors were the same size
(ie. 512 bytes).  New devices are changing that.  Rotating disks are
coming (or are here) that have 512 byte logical sectors, but 4K
physical sectors.

A physical sector has a header / footer that includes a checksum for
the entire physical sector.

That means a write that only updates part of a physical sector will
cause the drive internally to perform a read physical sector, modify
physical sector (including checksum), write physical sector sequence.

That is highly inefficient, thus the new topology patch was created to
allow filesystems and partitioning tools to align themselves with the
physical characteristics of the storage, not just the logical sector
sizes.  The code you quote is part of that alignment effort.

I don't know how that plays into flash / SSDs.  They have large erase
blocks that may have to updated all or nothing, typically 128K I
think.

Do they have bdev_hardsect_size set to 128K?  Maybe someone else can say.

Also given these topology patches are brand new, they may not have
been tuned to the various SSD / flash drives yet.

It could easily be that different manufacturers and different models
will have different optimal settings for the topology parameters.

> --
> Thanks & Regards
> Nidhi Mittal Hada
>

Greg

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ