# book

**Repository Path**: zzzhong/book

## Basic Information

- **Project Name**: book
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2016-07-01
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# book
通过nodejs程序爬取网站书籍
1. url：找到目录页面url
2. domain：找到详细页面跳转前缀比如a连接为h 完整地址为http://www.baidu.com/+h
http://www.baidu.com/为domain
3. menu：使用层级选择器找到目录页的a连接 .index_list a
4. content：使用层级选择器找到内容的根目录
5. loadMoreHtml：删除内容根目录里面多余的标签
6. menuMoreHtml：删除目录也a连接里面多余的标签
7. menuRule：目录页标题名称过滤