简介:
HTTrack 是一个免费开源的网站离线浏览器。通过它可以将整个网站下载到本地的某个目录,包括html、图片和脚本以及样式文件,并对其中的链接进行重构以便于在本地进行浏览。
官网:http://www.httrack.com/
1,安装
# yum install httrack –y
# httrack
2,使用用法
2.1)可以直接使用命令行进行爬取。
usage: httrack [-option] [+] [-] [+] [-]
#httrack “http://www.linuxea.com” -O “/web/www.linuxea.com” “+*.linuxea.com*” –v

2016100702

2.2)也可以使用交互界面来爬取
# httrack
Welcome to HTTrack Website Copier (Offline Browser) 3.48-21
Copyright (C) 1998-2015 Xavier Roche and other contributors
To see the option list, enter a blank line or try httrack –help
Enter project name :linuxea
Base path (return=/root/websites/) :/linuxea
Enter URLs (separated by commas or blank spaces) :www.linuxea.com
Action:
(enter) 1 Mirror Web Site(s)
2 Mirror Web Site(s) with Wizard
3 Just Get Files Indicated
4 Mirror ALL links in URLs (Multiple Mirror)
5 Test Links In URLs (Bookmark Test)
0 Quit
: 2
Proxy (return=none) :
You can define wildcards, like: -*.gif +www.*.com/*.zip -*img_*.zip
Wildcards (return=none) :
You can define additional options, such as recurse level (-r), separed by blank spaces
To see the option list, type help
Additional options (return=none) :
—> Wizard command line: httrack www.linuxea.com -W -O “/linuxea/linuxea” -%v
Ready to launch the mirror? (Y/n) :yes
WARNING! You are running this program as root!
It might be a good idea to run as a different user
Mirror launched on Fri, 07 Oct 2016 03:22:51 by HTTrack Website Copier/3.48-21 [XR&CO’2014]
3,检查爬取后的情况

2016100703

发表评论

后才能评论

评论(1)

  • James 2016年10月15日 上午11:56

    999,溜翻了