WAN-scraper

Name: LR Labs
Address: Via turati 12, Milan, Italy
Telephone: +39 3806947004
Price range: $$

Web scraper that works with DNS servers to collect random data around the world.

回購 #634528101
作者	Lorenzo Rottigni
創建於	2023-04-30
更新於	2023-04-30
推到	2023-04-30
尺寸	9 MB
主要語言	Python
星數	0
默認分支	main

Python
# languages

自述文件.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

Python 3.x
Scrapy

Installation & Usage

Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
Create venv "scraper": python3 -m venv scraper
Activate scraper venv: source scraper/bin/activate
Install dependencies: pip3 install -r requirements.txt
Run Scrapy spider: scrapy crawl wan_scraper
The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
For more information on using Scrapy, please refer to the official documentation.