How Google search works?

How Does Google Search Work?

On this page

    Before we talk about how Google search works, it is important to understand a few concepts. Google is a fully automated search engine which uses software commonly known as a Web Crawler. This web crawler explores the web (a huge collection of documents called web pages) regularly. In fact, web crawlers are the basis of how search works for most of the major search engines out there. Google’s web crawler is known as Googlebot.

    There are various stages of how search works. In the first section of this post, we will look at the concept of how search works from a bird’s eye view. In the later section, we will go deep into each stage.

    There are 3 stages of Google Search:

    1. Crawling: Google’s web crawler, Googlebot, browses through the huge web of documents on the internet and downloads information from the pages. This information may be text, images, or videos.

    Think of it as a visit to a grocery store and your strolling through different aisles. While strolling you keep noticing random items on the shelves and have them at the back of your mind.

    1. Indexing: In the indexing phase, Google analyzes the information (text, images, or videos ) it has collected during the crawling phase. This information is stored in a huge database called Google Index. Please note that all the crawled information may not be stored in the Google Index.
      Back to our grocery store analogy, after strolling through the aisles, you decided to pick some things and put them in your basket. Your basket is similar to the Google Index.
    1. Serving Search Results: This stage comes in when a user types a query on Google Search, Google looks back into its index to return the most relevant information to the user.

    Let us dig deeper into each stage now.

    Crawling

    Since the internet does not have a central database of all the web pages, so Google needs to have its own central registry. To build this central registry of web pages, Googlebot constantly looks for new and updated documents or web pages to add them to their index. This process is called URL discovery. 

    There is not just one way of URL discovery. Google may find new pages by following links on the already discovered pages. Or Google can find the list of pages on a website when you submit a sitemap using the Google Search Console. 

    Now the question arises, what are the parameters for crawling web pages? 


    There are set algorithms for Googlebot that determine:

    1. which sites to crawl 
    2. how often to crawl a site
    3. how many pages to crawl on a particular site

    Googlebot is smart enough to understand the crawl frequency for different sites to avoid overloading too many requests to the server. This is also controlled by the website’s response to Googlebot’s crawl request and settings by the website owner in the Google Search Console.

    An important concept to understand here is that ‘discovery’ and ‘crawling’ are two different things. First, the Googlebot discovers a webpage and then crawls it.  

    Keeping that in mind, please note that:

    Every webpage discovered by Googlebot does not mean it will be crawled too.

    Below are the most common scenarios when a URL is discovered but not crawled:

    1. if a website owner has blocked Googlebot from crawling the web page using the robots.txt file
    2. a webpage can be accessed only by logging in to it. 
    3. the discovered page is a duplicate of a similar page crawled previously by Googlebot.

    Most websites today rely on Javascript to display content on the page. So Googlebot executes the Javascript on a page to see and understand the content. So if there is a problem in rendering the page using Javascript, it will affect the crawling of Googlebot too.

    Indexing

    After the crawling stage, Google carefully analyzes the information it has collected. All the attributes and elements of the crawled pages are processed and analyzed so that it can be stored in Google’s database called Google Index.

    Please note that:

    Every crawled page is not stored in the Google Index.

    There are various signals and parameters that determine which pages will be indexed after crawling. 

    One of the most important parts of the indexing process is determining canonical pages.

    If two pages have a similar type of content or are complete duplicates one of them is considered the primary page or canonical page. This canonical page is served to the user for a query related to that page. 

    From a bird’s eye view, indexing is affected by the following parameters:

    1. low-quality content on the page
    2. indexing is disabled by robots meta directives
    3. the design of the website is a bottleneck, especially when Javascript execution is a problem.

    Serving Search Results

    Crawling is done, indexing is done. Now comes the part where the indexed documents or web pages are served to the user upon a query.

    When a user enters a query, Google looks into its index to return the most relevant and highest quality result. This is very important as the Google search engine is ultimately a product which is used by common users to get information. If the user is not served by the most accurate results, it will ruin the user experience and hence the product quality.

    Now it is important to understand that the relevancy of the result depends on many factors such as the user’s location, language, device used etc. 

    For example, “best restaurants near me” will show different results in New York and Toronto

    Please note that:

    If a page is indexed, it is not guaranteed to show in the search results for a particular query.

    You need to optimize your webpage by following the best SEO practices in order to increase its visibility on Google search.

    This is essentially how Google Search Works. The basics remain the same for now. Crawling, indexing and serving results. However, Google keeps on improving its algorithm to serve better results.

    Posted in

    Share this article:

    Ankit Chauhan is an SEO Consultant and Researcher. Having more than 5 years of extensive experience in SEO, Ankit loves to share his SEO expertise with the community through his blog. Ankit Chauhan is a big-time SEO nerd with an obsession for search engines and how they work. Ankit loves to read Google patents about search engines and conduct SEO experiments in his free time.