check out my blog: blog.0x79.com
I am running a crawler from this site for a university project.
I try to follow any robots.txt rules and keep a 2-5 second interval between each request.
You can checkout the source-code of the crawler here
If you do not want me to crawl your site anymore, contact me at mail(at)[this sites url] and i will put you on a no-crawl list