# spider_allen **Repository Path**: BFD-SoftwareStudio/spider_allen ## Basic Information - **Project Name**: spider_allen - **Description**: sssssssssssssss - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-02-21 - **Last Updated**: 2023-07-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # spider_allen #### 介绍 本程序为蜘蛛爬虫程序,采用 asyncio 异步爬取jd商品信息,selenium 爬取tb信息,采集解析数据使用到了etree & xpath等方法, 数据本地持久化采用excel(openpyxl)存储更新,采集操作日志采用logging,采用tkinter模块 提供PC UI应用端操作。 - 分支区别: - release-asyncio jd采集使用异步协程方式 - release-sync jd采用普通串行循环 该程序仅供交流学习使用,采集使用请遵守robots协议,请勿用于商业或违法行为,否则将承担法律风险。 This program is a spider crawler program. It uses asyncio to asynchronously crawl the jd commodity information, selenium to crawl the tb information, and uses methods such as etree&xpath to collect and analyze data, The local persistence of data adopts excel (openpyxl) to store and update, and the collection operation log adopts logging, and the tkinder module is used to provide PC UI application side operation. - Branch difference: - The release-asyncio jd collection uses asynchronous coprocess mode - The release-sync jd uses a common serial loop This program is only for communication and learning. Please abide by the robots agreement for collection and use. Do not use it for commercial or illegal activities, otherwise you will bear legal risks. #### 展示 ![img.png](introduce/1.png) ![img_1.png](introduce/2.png) ![img.png](introduce/3.png) ![img_1.png](introduce/4.png)