WAN-scraper

Name: LR Labs
Address: Via turati 12, Milan, Italy
Telephone: +39 3806947004
Price range: $$

Screenshot of WAN-scraper GitHub repository

Web scraper that works with DNS servers to collect random data around the world.

レポ #634528101
著者	Lorenzo Rottigni
作成日	2023-04-30
更新日時	2023-04-30
押された	2023-04-30
サイズ	9 MB
主な言語	Python
星の数	0
デフォルトのブランチ	main

Python
# languages

私を読んでください.md

WAN-scraper

Overview

This is a web scraping project using Python Scrapy to collect data from a list of random websites. The goal is to gather information on website structure, content, and other relevant data for analysis.

Requirements

Python 3.x
Scrapy

Installation & Usage

Clone the repository: git clone https://github.com/LorenzoRottigni/WAN-scraper.git
Create venv "scraper": python3 -m venv scraper
Activate scraper venv: source scraper/bin/activate
Install dependencies: pip3 install -r requirements.txt
Run Scrapy spider: scrapy crawl wan_scraper
The spider will visit the URLs in the start_urls list in wan_scraper/spiders/wan_scraper.py, collect data, and save it to a CSV file located in the project directory.

Additional Information

The spider is set to obey the robots.txt file on each website visited, but please use caution and follow ethical scraping practices.
Feel free to modify the spider's behavior to suit your needs by editing the code in wan_scraper/spiders/wan_scraper.py.
For more information on using Scrapy, please refer to the official documentation.