LOGO

Web development services fast secure and SEO optimized by VertixWeb

Email:mr.wangweb@gmail.com

Get a Proposal
Get a Proposal
Contact Us

Feedfetcher


Feedfetcher

Feedfetcher is how Google crawls RSS or Atom feeds for Google News and PubSubHubbub. Feedfetcher stores and periodically refreshes feeds that are requested by users of an app or service. Only podcast feeds get indexed in Google Search; however, if a feed doesn’t follow the Atom or RSS specification, it may still be indexed. Here are some answers to the most commonly asked questions about how this user-controlled feed grabber works.

When users add a service or app that uses Feedfetcher data, Google’s Feedfetcher attempts to obtain the content of the feed in order to display it. Since Feedfetcher requests come from explicit action by human users, and not from automated crawlers, Feedfetcher ignores robots.txt rules.

If your feed is publicly available, Google can’t restrict users from accessing it. One solution is to configure your site to serve a 404410, or other error status message to Feedfetcher-Google user agent.

If your feed is provided by a blog or site hosting service, work directly with that service to restrict access to your feed.

Feedfetcher shouldn’t retrieve feeds from most sites more than once every hour on average. Some frequently updated sites may be refreshed more often. Note, however, that due to network delays, it’s possible that Feedfetcher may briefly appear to retrieve your feeds more frequently.

Feedfetcher retrieves feeds at the request of services or apps installed by users. It is possible that a user has requested a feed URL that does not exist.

Feedfetcher retrieves feeds at the request of services or apps installed by users. It is possible that the request came from a user who knows about your “secret” server or typed it in by mistake.

Feedfetcher retrieves feeds only after users have explicitly started a service or app that requests data from the feed. Feedfetcher behaves as a direct agent of the human user, not as a robot, so it ignores robots.txt entries. Since Feedfetcher acts as an agent for multiple users, it conserves bandwidth by making requests for common feeds only once for all users who requested the feed through an app or service. The common feeds are RSS and Atom.

You can prevent Feedfetcher from crawling your site by configuring your server to serve a 404410, or other error status message to the Feedfetcher-Google user agent.

Feedfetcher was designed to be distributed on several machines to improve performance and scale as the web grows. To cut down on bandwidth usage, the machines used are often located near the sites that they’re retrieving in the network.

The IP addresses used by Feedfetcher are included in the user-triggered-fetchers-google.json object.

In general, Feedfetcher only downloads one copy of each file from your site during a given feed retrieval. Very occasionally, the machines are stopped and restarted, which may cause it to again retrieve pages that it’s recently visited.

Unlike normal web crawlers, Feedfetcher isn’t discovering links to crawl at all; instead, it crawls a single URL that’s provided to it by users of a service or app that uses Feedfetcher.

If you’re still having trouble, try posting your question in the Search Central forum.

Get your free, customized high-quality website development and operation plan today!

Contact Us
Website Construction
Performance Marketing Enhancement Suite(MO)

message x - -
Contact Us

Apply for your exclusive plan for free

Later Consultation

Online Consulting