WAN-scraper

Name: LR Labs
Address: Via turati 12, Milan, Italy
Telephone: +39 3806947004
Price range: $$

Screenshot della repository di GitHub: WAN-scraper

Web scraper that works with DNS servers to collect random data around the world.

Repo #634528101
Autore	Lorenzo Rottigni
Creato il	2023-04-30
Aggiornato il	2023-04-30
Pushato il	2023-04-30
Dimensione	9 MB
Linguaggio principale	Python
Conteggio stelle	0
Branch principale	main

Python
# languages

README.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

Python 3.x
Scrapy

Installation & Usage

Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
Create venv "scraper": python3 -m venv scraper
Activate scraper venv: source scraper/bin/activate
Install dependencies: pip3 install -r requirements.txt
Run Scrapy spider: scrapy crawl wan_scraper
The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
For more information on using Scrapy, please refer to the official documentation.