Hello, today I wanna tell you about my last python script (really nuts and bolts), called sitemaps.
With it you can build your website sitemaps, it works as a simple spider, it doesn’t produce a xml file but later maybe i can implement this function, for now it only produces a txt file.
This little spider, starts its crawling from a user specified starting page, and it moves all around the internal links it founds. At the end it produces a txt file with all the links, one per line, found during its crawl, this file could be submitted to Google SiteMaps and its syntax is legal.
The importans of using this kind of tool is that Google could index all your internal pages in less time, giving you the chance to increase your popularity.
As I said before this is really “nuts and bolts” but it’s a good starting point to write more sofisticated spider or web analysis tools.
The script core is a BFS (Breadth first search) visit of the graph composed from target web site internal links.
As usually if you have any comments, please don’t esitate, write them!
Here is the source code
Error: Could not open sitemap.py
and here sitemap.py you can download the script.
Comments
Leave a comment Trackback