NaNo Status

Recent Posts

Archives

Categories

Recent Comments


« Review: iAudio M5 (day 1) | Main | Review: iAudio M5 (day 4) »

Useful Utility: wget

By Jesse Morgan | February 21, 2006

Wget is useful for a lot of things- downloading images from a directory listing, mirroring a website, recursively fetching one subdirectory of a website, etc. The main focus as you can tell is downloading from the web(http, https, ftp) in a non-interactive manner.

There are a lot of flags to change the behavior, and you can get all sorts of wild behavior by mixing and matching those flags. The most straightforward use is this:

wget http://example.com/software_pkg.tar.gz

This is the best tool for grabbing packages off a website to install on a headless machine. I can’t count the number of times I’ve needed to install something on my server manually and used wget to do it.

Useful Flags

Flag Use
-nc no clobber. useful for re-downloading sites, but not files you already have.
-N similar to -nc, except it overwrites if the file on the server is newer than the local version. useful for keeping mirrors up to date.
-r work recursively through a site and grab all the files
–spider doesn’t download the files, just checks to see if they’re there.
-l x designates to recursively download to a “depth” of x
-m mirrors a site. turns on timestamping, infinite recursion and keeps ftp indexes.
-X list exclude a list of directories from download. useful for preventing yourself from mirrors 30 gigs of video files
-np no parent. prevents you from accidentally going up into the parent directory when downloading recursively.

That’s a good start. There are a lot more options, but those listed above (and combinations of them) are probably enough to get you moving. Use this tool. Play with it. Mirror a site, play with the -N and -nc flags. Please feel free to include your own wget recipes below in the comments.

K_F, dev_null, I’m looking at you.

Topics: Linux, Open Source, Reviews, Utility |

3 Responses to “Useful Utility: wget”

  1. K_F Says:
    February 21st, 2006 at 9:42 am

    Recipe 1: All your comic needs
    wget -erobots=off –no-parent –mirror -p http://darkgate.net/comic/images/
    -erobots=off tell wget to ignore the robots extension rules (robots.txt) , we’re being a bad boy. –mirror , or -m is detailed above. -p

    -p
    –page-requisites
    This option causes Wget to download all the files that are neces-
    sary to properly display a given HTML page. This includes such
    things as inlined images, sounds, and referenced stylesheets.

  2. dev_null Says:
    February 21st, 2006 at 10:48 am

    I have a mirror script that I use to nab a site.

    wget -mxkN

    -m = mirror
    -k = convert links to local browsing
    -N = only download newer files that local
    -x = create local directory tree

    I am not sure that it is perfect but it seems to do the job correctly.

  3. VP|bofh Says:
    February 21st, 2006 at 2:50 pm

    wget is for hippies, use curl. Very handy when you can add a wildcard to the url to match anything specific within a folder or whatever.

Comments