4xx Client Errors
- 400 Bad Request: The server cannot process the request due to malformed syntax. This may occur if the crawler sends invalid headers or parameters.
- 401 Unauthorized: Authentication is required to access the resource. The crawler lacks valid authentication credentials.
- 403 Forbidden: The server understood the request but refuses to authorize it. This often happens when the crawler's IP is not whitelisted or access is restricted.
- 404 Not Found: The requested resource could not be found on the server. The URL may be incorrect or the page has been removed.
- 408 Request Timeout: The server timed out waiting for the request. This can occur when network latency is high or the server is slow to respond.
- 429 Too Many Requests: The crawler has sent too many requests in a given timeframe. Rate limiting is in effect to prevent server overload.
5xx Server Errors
- 500 Internal Server Error: A generic error indicating the server encountered an unexpected condition. This could be due to server misconfigurations or application errors.
- 502 Bad Gateway: The server, acting as a gateway or proxy, received an invalid response from the upstream server.
- 503 Service Unavailable: The server is temporarily unable to handle the request, often due to maintenance or overload.
- 504 Gateway Timeout: The server, acting as a gateway, did not receive a timely response from the upstream server.
Network & Connection Errors
- Connection Timeout: The crawler could not establish a connection to the server within the specified time limit. This may indicate network issues or firewall restrictions.
- DNS Resolution Failure: The domain name could not be resolved to an IP address. This suggests DNS configuration issues or an invalid domain.
- SSL/TLS Errors: Certificate validation failures or protocol mismatches when attempting to establish a secure connection.