Google: Why were cornflakes invented?

You’ll get around 5,01,000 results in 0.54 seconds (depending on your Internet speed, too). But how did those results end up there?

The Search Engine Crawler.

Before you search, a web crawler grabs information from thousands of websites and organises it in the search engine’s index. For a quick history lesson — the first crawler on the World Wide Web (WWW) came out in 1993, developed by MIT with the purpose of measuring the growth on the Web. An index was created soon after from the results, thus creating the first “search engine”.

Known by many different names, from web spiders to automatic indexers, search engine crawlers have evolved to not only index written content but also alt text, images, and other non-HTML content.

Its automated script browses the WWW and provides data for the search engine to put up when you ask why cornflakes were made, or what the latest hairstyle for men is.

If your brand isn’t showing up on Google, this might know why.

How does a crawler work?

Crawling is basically a discovery process where search engines send out a team of its robot spiders to find new and updated content on web pages, discoverable through links or URLs (Uniform Resource Locators).

The crawler gets a list of URLs to visit and store — but doesn’t rank pages. Its job is just to go out there and visit websites, using the hyperlinks on those sites to further discover other pages to bring back to the search engine servers.

Crawlers pay special attention to new sites, dead links, and changes to existing sites, kind of like an ever-growing library.

For example, the Google search index is easily over 100,000,000 gigabytes in size, taking note of keywords and website freshness in order to organise information in its servers in a way that when you search, you’ll get the most relevant results to the question you’ve asked.

There are hundreds of crawlers out there regularly indexing the web, all the way from specialised ones like image indexers, to more general ones like Googlebot (Google), MSNBot (MSN) and Slurp (Yahoo).

Why should crawlers matter to you?

As someone working in digital marketing, understanding how to get a web page ranked highly on a search engine is important. Because there are so many pages on the Internet, and the frequency and dynamism of their change, search engine crawlers have a hard time crawling.

All these variations give these crawlers a huge workload of URLs — and cause them to prioritise certain web pages and hyperlinks. Here’s a list of the file types that are indexable by Google’s crawlers.

Pages known to the search engine are periodically re-crawled to check if any changes are made from the last time in order to update its index. Search engines use algorithms to determine how often a page should be re-crawled — the more you update a page, the more likely it is that your page will be crawled to check for updates as compared to a page that’s infrequently modified.

A good design tip?

Test your website against different hardware platforms (from Windows 98 to Windows XP) and browsers (Google Chrome and Mozilla Firefox) to ensure compatibility where search engines can ensure most of their users find a site they can actually use. Crawlers can be a site owner’s best friend, as long as the site is well-tested to allow them to work without roadblocks.

Figuring out how to get your website ranked high on search engines and working through SEO (search engine optimisation) can be daunting, but with the right help and some SEO guides, you’ll be well on your way to becoming a highly-ranked website soon enough.