Web Grater
A very simple scrapy app for finding broken hrefs/images on your website. I've used this on a number of website to help find broken content and hopefully it can help you.
How do I use this?
The project is dockerized but if don't want to use docker you can make a
virtualenv. First you need to update the site you want to crawl in
web_grater/spiders/grater_spider.py
. After that you can just run:
make up && make crawl
This will print off all the urls the scraper can find and the broken ones in red.
How do I find which page contains the broken link?
In grater_spider.py
there is a CustomLinkExtractor
with a method called
extract_links
. There is some commented out code that shows how you can search
for the broken link during link extraction and print the parent page in yellow.