I collect more than 300 urls on seed.txt that should be crawled by nutch, but approximately 80 nutch did not crawl. I controlled these sites and find out that most of these sites has allowed to crawl in robots.txt by:
User-agent: *
Why nutch doesn't crawl these sites? Is there a possibility to fix this behavior?
0 Replies