System Requirements:
- Windows 95/98/2000/NT/ME/XP/Vista
- 32 MB RAM
- 1 MB Hard Disk Space
- Internet Connection
A powerful web crawler utility to Extract:
* URL* meta tag (title, description, keyword)
* plain text between to tag
* page size
* last modified date value
From:
* Web Site* Web Directories
* Search Results
* List of URLs from file.
Here you need to tell the "Win Web Crawler" - how many levels to explore in the specified website. If you want to "Win Web Crawler" to remain in the first page, just select "Process First Page Only". Setting of "0" will process and look for data in whole website. Setting of "1" will process index or home page with the files related to under root dir only.
To extract all url, no meta tag data from a web site
- Go to New Session Dialog
- Select "Source = WebSite"
- Enter website URL in Starting Address box: like http://www.mydomain.com
- Select depth = 0 (to spider entire website , see more about depth here)
- Select "Save Data" folder , i.e. where program will save the data
- Select Save Format - CSV or line by line
- Click OK button
For example: "Win Web Crawler" is going to visit URL http://www.yoursite.com/product/milk/ for data extraction.
After your website fill there and you just select the option you want then click ok, will run automatic.
Open your Search Result crawler in your save dir, this result:
* http://www.yoursite.com/
* http://www.yoursite.com/contact.htm
* http://www.yoursite.com/about.htm
* http://www.yoursite.com/product/
* http://www.yoursite.com/product/support.htm
* http://www.yoursite.com/product/milk/
* http://www.yoursite.com/product/water/
* http://www.yoursite.com/product/milk/baby/
* http://www.yoursite.com/product/milk/baby/page1.htm
* http://www.yoursite.com/product/milk/baby/page2.htm
* http://www.yoursite.com/product/water/mineral/
* http://www.yoursite.com/product/water/mineral/news.htm
High-speed, multi-threaded, accurate extraction - directly saves data to disk file. Program many filters to restrict session, like - URL filter, text filter, data filter, domain filter, date modified, etc. This allows the recursion level can be selected by the user, yarn-making, timeout, proxy support and many other options. Must have tool for webmasters that build directory. See more Detail
Download win web crawler v2.0 here or here from ziddu