WAN-scraper
Web scraper that works with DNS servers to collect random data around the world.

Repository Details
| Repo #634528101 | |
|---|---|
| Author | Lorenzo Rottigni |
| Created At | 2023-04-30 |
| Updated At | 2023-04-30 |
| Pushed At | 2023-04-30 |
| Size | 9 MB |
| Main Language | Python |
| Star count | 0 |
| Default branch | main |
Repository Skills
README.md
WAN-scraper
Overview
This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.
Requirements
- Python 3.x
- Scrapy
Installation & Usage
- Clone the repository:
git clone https://github.com/LorenzoRottigni/WAN-scraper.git - Create venv "scraper":
python3 -m venv scraper - Activate scraper venv:
source scraper/bin/activate - Install dependencies:
pip3 install -r requirements.txt - Run Scrapy spider:
scrapy crawl wan_scraper - The spider will visit the URLs in the
start_urlslist inwan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.
Additional Information
- The spider is set to obey the
robots.txtfile on each website visited, but please use caution and follow ethical scraping practices. - Feel free to modify the spider's behavior to suit your needs by editing the code in
wan_scraper/spiders/wan_scraper.py. - For more information on using Scrapy, please refer to the official documentation.