HTTP状态码 - 为什么我的网站的抓取?

检查下列事项: What is the redirect destination? (Check the outlinks of the returned URL).

可以尝试: If this is the same as the starting URL, follow the steps described in our why do URLs redirect to themselves 常问问题.

原因: The redirect is in a loop where the SEO蜘蛛 never gets to a crawlable HTML page. If this is due to a cookie being dropped, this can be bypassed by following the steps in the 常问问题 linked above.

 

检查下列事项: External Tab.

可以尝试: Configuration > Spider > Crawl All Subdomains.

原因: seo的蜘蛛 treats different subdomains as external 和 will not crawl them by default. If you are trying to crawl a subdomain that redirects to a different subdomain, it will be reported in the external tab.

 

检查下列事项: Does the site require cookies? (View the page in a browser with cookies disabled).

可以尝试: Configuration > Spider > Advanced Tab > 允许Cookie.

原因: seo的蜘蛛 is being redirected to a URL where a cookie is dropped, but it does not accept cookies.

400 - 错误的请求/ 403 - 禁止

The server cannot or will not process the request / is denying the SEO蜘蛛’s request to view the requested URL.

403 Forbidden

检查下列事项: Can you view the page in a browser or does it return a similar error?

可以尝试: 如果 page can be viewed, 设置成googlebot或铬作为用户代理 (Configuration > HTTP Header > User-Agent).

原因: The site is denying the SEO蜘蛛’s request of the page (possibly as protection/security against unknown user agents).

找不到网页/ 410 - - 404删除

The server is indicating that the page has been removed.

404 Not Found

检查下列事项: Does the requested URL load a normal page in the browser?

可以尝试: Is the status code the same in other tools (Websniffer, Rexswain, browser plugins etc.).

原因: 如果 status code is reported incorrectly for every tool, the site/server may be configured incorrectly serving the error response code, despite the page existing.

 

可以尝试: 如果 page loads then try Googlebot or Chrome as the user agent (Configuration > HTTP Header > User-Agent).

原因: Site is serving the server error to the SEO蜘蛛 (possibly as protection/security against unknown user agents).

429 - 请求过多

To many requests have been made of the server in a set period of time.

检查下列事项: Can you view your site in the browser or does this show a similar error message?

可以尝试: Lowering the crawl speed 和/or testing a Googlebot user agent.

原因: The server is not allowing any more requests as too many have been made in a short period of time. Lowering the rate of requests or trying a user agent this limit may not apply to can help.

500 / 502 / 503 – Internal Server Error

The server is saying that it has a problem.

500 Server Error

检查下列事项: Can you view your site in the browser or is it down?

可以尝试: 如果 site is up then try Googlebot or Chrome as the user agent (Configuration > HTTP Header > User-Agent).

原因: Site is serving the server error to the SEO蜘蛛 (possibly as protection/security against unknown user agents).

 

It is possible for more than one of these issues to be present on the same page, for example, a JavaScript page could also have a meta ‘nofollow’ tag.

There are also many more response codes than this, but in our own experience, these are encountered infrequently, if at all. Many of these are likely to also be resolved by following the same steps as other similar response codes described above.

More details on response codes can be found at //en.wikipedia.org/wiki/List_of_HTTP_status_codes

下载.

下载

Purchase a licence.

Purchase