Re: How to extract string from filename

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have sent this to Tony directly yesterday, but forgot that I cannot send to some of his addresses directly.
I also realized that it might help some one else, so here it is.


---------- Forwarded message ----------
Date: Wed, 29 Jul 2015 14:10:09 +0200 (SAST)
From: Willem van der Walt <wvdwalt@xxxxxxxxxx>
To: Tony Baechler <tony@xxxxxxxxxxxx>
Subject: Re: How to extract string from filename

So close to the full script would be something like:
#!/bin/bash
#Since there are no spaces in the names, use for loop
for i in $(ls *mp3);do
 grab=`echo $i|rev|cut -f2 -d'_'|rev`
 wget www.bbc.co.uk/programme/$grab -O"$grab".html
done
HTH, Willem



On Wed, 29 Jul 2015, Tony Baechler wrote:

Hi all,

The recent discussion on shell scripts got me thinking. A couple of posters invited people to post problems they're having with scripts to the list, so here goes.

I have not actually written a script for this because I'm not sure how to go about it. I would normally use cut, but I need to cut from right to left. The cut help doesn't indicate a way to do this. You can only cut from the beginning of the line or a range of bytes. The problem is each line (filenames, to be exact) are of different lengths, so it's impossible to know what range of bytes I need.

What I'm trying to do is extract the BBC PID from the downloaded files. It's a lower case alphanumeric string which starts with a letter and is eight characters. In my case, the first letter is always "b" or "p," so if I could use something like grep to just extract the first lower case letter followed by a number up to the next underscore, that would be good. I don't think grep will just print a matching phrase, only the matching line. Here are some example filenames:

5_live_Science_-_Coding_and_Computers_b062dj5j_default.mp3
Witness_-_The_Sinking_of_the_USS_Indianapolis_p02wdykn_default.mp3
Discovery_-_A_Scientific_View_of_Agriculture_p0053gbd_default.mp3
Click_-_05_10_2010_p00b18gp_default.mp3

As you can see, they all follow a similar format. If I could go from right to left, I would simply cut "_default.mp3" and extract the preceeding 8 bytes, but I can't figure out how. What I'm trying to do is first extract the PIDs, hopefully preserving the filenames in the process. Once they are extracted (or printed to stdout), I want to use wget to download the BBC programme page. If you go to www.bbc.co.uk/programme/bXXXXXXX, you'll get a web page displaying the broadcast date, description and notes. I would like to download those pages.

Any help with this would be greatly appreciated.  Thanks in advance.

--------------------
Tony Baechler, Baechler Access Technology Services
Putting accessibility at the forefront of technology
mailto:bats@xxxxxxxxxxxxxx
Phone: 1-619-746-8310   Fax: 1-619-449-9898

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.

Please consider the environment before printing this email.



_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list



[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]