Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper

Naohiro Aota <Naohiro.Aota@xxxxxxx> · Mon, 28 Aug 2023 01:37:26 +0000

On Fri, Aug 25, 2023 at 10:00:09PM +0800, Zorro Lang wrote:
> On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote:
> > On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> > > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> > > random file in a directory.It sorts the files with "ls", sort it randomly
> > > and pick the first line, which wastes the "ls" sort.
> > > 
> > > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > > 
> > >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> > >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > > 
> > >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> > >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > > 
> > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> > > Introduce _random_file() helper to do it properly.
> > > 
> > > Signed-off-by: Naohiro Aota <naohiro.aota@xxxxxxx>
> > > ---
> > >  common/rc | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 5c4429ed0425..4d414955f6d9 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -5224,6 +5224,13 @@ _soak_loop_running() {
> > >  	return 0
> > >  }
> > >  
> > > +# Return a random file in a directory. A directory is *not* followed
> > > +# recursively.
> > > +_random_file() {
> > > +	local basedir=$1
> > > +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
> > 
> > I think the "1" can be the second argument, for we might want to get a random
> > file list sometimes. For example:
> > 
> >   local basedir=$1
> >   local num=$2
> >   local opt
> > 
> >   if [ -n "$num" ];then
> > 	  opt="-n $num"
> >   fi
> >   echo "$basedir/$(ls -U $basedir | shuf $opt)"
> > 
> > What do you think?
> 
> Hmm... nack my review point. Looks like this makes a simple change to be
> complicated, especially multiple output lines. I'll merge this patchset
> at first, then we can support that second argument If we need that
> feature in one day. Or if you're interested in it.

Yeah, I think so too. Currently, we don't have "shuf -n N" usage where N >
1. Also, while there is a plain "shuf" usage, it is used with "find".

So, I agree we can extent the function when we need it.

> 
> Thanks,
> Zorro
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +}
> > > +
> > >  init_rc
> > >  
> > >  ################################################################################
> > > -- 
> > > 2.41.0
> > > 
>