Copy an Entire Web Site on Your Macintosh [or PC] by Dick Eastman

Considering that stuff disappears off the web, and that corporate media sometimes change articles without acknowledging they’ve made changes, we can all help preserve the web (and history and reality) for posterity by saving web pages as files (click ctrl + S on a PC, or Command + S on a Mac, and options should pop up). Archive.org is sometimes months behind in its crawls, and some websites block its robots. By using free apps like httrack and sitesucker to preserve important websites in different places around the world, it may act as somewhat of a deterrent to people who may be planning an internet 9/11, and may help the world reconstruct history and reality should anything disastrous happen to the web.

Law Professor: Counter Terrorism Czar Told Me There Is Going To Be An i-9/11 And An i-Patriot Act – Stanford Law professor Lawrence Lessig details government plans to overhaul and restrict the Internet http://www.infowars.net/articles/august2008/050808i911.htm 

September 13, 2007 

Copy an Entire Web Site on Your Macintosh by Dick Eastman

Source: http://blog.eogn.com/eastmans_online_genealogy/2007/09/copy-an-entire-.html 

Almost two years ago, I wrote an article entitled, “Copy an Entire Web Site with HTTRACK.” That article is still available at http://blog.eogn.com/eastmans_online_genealogy/2005/12/copy_an_entire_.html. It describes the operation of a Windows program that can download an entire web site for offline browsing or for backup purposes. It is a good method of backing up your web site. 

Now a similar utility is available for Macintosh users. Best of all, it is a free program. The author does accept donations, however. 

SiteSucker automatically downloads Web sites from the Internet. It does this by copying the site’s Web pages, images, backgrounds, movies, and other files to your local hard drive. Just enter a URL (Uniform Resource Locator), press the Enter key, and SiteSucker can download an entire Web site. 

NOTE: Programs such as SiteSucker or HTTRACK are excellent for downloading static web pages. That is, web pages that never change. However, they will not work in interactive sites, such as those that query online databases. Don’t try to download http://www.FamilySearch.org or eBay, even if you do have the disk space available. 

You can use SiteSucker to make local copies of your Web sites for easy maintenance. It can either download files unmodified or “localize” the files it downloads, allowing you to browse a site offline. 

If SiteSucker is in the middle of a download when you choose the Save command, SiteSucker will pause the download and save its status with the document. When you open the document later, you can restart the download from where it left off by pressing the Resume button. 

SiteSucker is a Universal application, which means that it’s made to run on both Intel- and PowerPC-based Mac computers. SiteSucker requires Mac OS X 10.4.x Tiger or greater. Of course, to download files, your computer will also need an Internet connection. If it is a large site, you will also need plenty of available disk space. 

There are several limitations. As mentioned earlier, the program cannot query databases and will not work on any web site that asks for user input and then builds pages “on the fly,” based on the input. That leaves out many genealogy databases, as well as eBay and others. 

SiteSucker totally ignores JavaScript. It will not see any link specified within JavaScript. (If the Log Warnings option is on in the download settings, SiteSucker will include a warning in the log file for any page that uses JavaScript.) 

SiteSucker does scan Flash (SWF) files for embedded plain text links, but it can only detect links to files that have one of the following extensions: html, swf, mp3, sit, zip, mov, gif, jpg, png, doc, or txt. SiteSucker cannot localize Flash files, and it does not examine other media files for embedded links. 

By default, SiteSucker honors robots.txt exclusions and the Robots META tag. Therefore, it will not download any directories or pages disallowed by robot exclusions. However, you can override this behavior  with the Ignore Robot Exclusions setting that’s under the Advanced tab in the download settings. 

The free SiteSucker program is available at  http://www.sitesucker.us/

Fair Use Notice

This page contains copyrighted material, the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of political issues relating to alternative views of the 9/11 events, etc. We believe this constitutes a “fair use” of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information go to: http://www.law.cornell.edu/uscode/17/107.shtml. If you wish to use copyrighted material from this site for purposes of your own that go beyond “fair use”, you must obtain permission from the copyright owner.

Advertisements

Post a Comment

Required fields are marked *

*
*

%d bloggers like this: