Steve Kemp wrote:
> A simple request which is confusing me mightily!
>
> I'd like to download a remote webpage *including* any images, css
> files, etc which are required and rewrite those to work in the
> local copy. This is simple stuff with wget usually, but I'm
> running into problems because I must have the initial page
> be downloaded to a fixed name.
>
> wget seems to dislike my initial attempt:
>
> wget --O index.html --no-clobber --page-requisites \
> --convert-links --no-directories --url=http://en.wikipedia.org/
>
> The "--no-clobber" here, designed to avoid a file overwriting one
> which already exists, stops things from working.
>
> curl seems to allow me to name files like -O "index_#1", but it
> doesn't do rewriting of the page contents (images/css/etc).
>
> (I'm trying to create archives of bookmarks in an online bookmark
> application - so I want files for bookmark "xx" to be located in
> /path/to/archives/xx/ - which is why I have to insist upon "index.html"
> as the initial page.)
>
> I guess I could use perl to get a URLs contents, parse it for
> links, and then get them individually - but it seems like this should
> be a simple request... I looked at httrack too, but that seemed
> confusingly complex.
>
> Steve
Could you use
wget -nv en.wikipedia.org
to get the name of the first file in a relatively-easy-to-parse format?
Then after running
wget --page-requisites --convert-links --no-directories \
en.wikipedia.org
you could just rename the relevant file. As you aren't doing recursion,
links shouldn't be messed up.
The -O option won't work because it will try to put all the files
(images and css etc.) into a single file. And --no-clobber is no good
to you either.
Hope that helps.
cheers
Chris
--
Chris Dennis cgdennis@???
Fordingbridge, Hampshire, UK