Just lurking and I saw this. A simple technique might be to insert a
new line before each href then use grep and cut. e.g. open it in vim
and do:
:%s/href=/^Mhref=/gc
:%s/HREF=/^Mhref=/gc
(where ^M is ctrl+v followed by the return key)
Then
grep href filename.html|cut -d '"' -f 2
and optionally
... | sort | uniq
There might be some way to do a case insensitive find and replace in
vim, but I don't know it of the top of my head.
Jeremy.
On 12 September 2011 11:19, Vic <lug@???> wrote:
>
>> You can probably do this quite easily in perl.
>
> You can.
>
>> Are there any nice short programs to do this?
>
> Something like this?
>
> #! /usr/bin/perl
>
> my $fname = $ARGV[0];
> die "need a filename" unless defined ($fname);
>
> open INFILE, "<$fname" or die "Can't open $fname for reading";
>
> while (<INFILE>)
> {
> my @links = $_ =~ m|<a +href="([^"]+)"|gc;
> if(scalar(@links) > 0)
> {
> foreach my $link (@links)
> {
> # Do something here
> print "Link : $link\n";
> }
> }
> }
>
> You could probably write this in a much more compact fashion if you wanted.
>
> Vic.
>
>
> --
> Please post to: Hampshire@???
> Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
> LUG URL: http://www.hantslug.org.uk
> --------------------------------------------------------------
>