What happens when you type https://www.google.com in your browser and press Enter
It has almost become second nature for us to browse the internet all day because we use it frequently for so many different things. It has gotten to the point where we are frequently satisfied with the fact that our browsers work as intended. The websites we view, however, must originate from somewhere; it is not magic! So, how is everything developing, then? What occurs in the background between the time we type a URL (Uniform Resource Locator) into the search box and the time we see the content of the desired page?
Fortunately, many processes occur in a split second, so you don't often think about them.
I want to mention that it's crucial to understand what a client-server model is before delving more into this interesting topic. The "Google" server must answer our request, and we must be using a machine with a browser on our side. Similar to how calling a phone requires a receiver or someone to answer it, we also require a kind of server on the part of Google in order to access that webpage. Computers themselves are servers. Consequently, in this instance, our computer is a client and the "computer" of Google is a server.
The DNS Server
As soon as we enter the URL www.google.com into our browser and press "Enter," the browser will begin to split the URL. The domain name portion of the URL, google
, will be taken into account first by the browser. We first must know what an "IP address" is in order to understand what a "domain name" is. IP stands for Internet Protocol.
Say you wish to call a friend of yours on the phone. In order to call your friend, you clearly need its phone number. Similarly, computers use IP addresses to interact with one another over the internet. Like a person has unique phone number, so do computer. They have unique IP address.
IP addresses have a specific format: four digits from 0 to 255 should be present, separated by dots, as in 231.95.4.23. IPv4 refers to this. Read more about IPv4 and IPv6. However, the idea remains the same unique numbers serve to identify computers in particular ways. (E.g.: 35.162.7.193)
As you can see, it is becoming more difficult to remember IP addresses. At that point, domain name enters the scene. Domain names are an alphabetical representation of IP addresses, much like how you save a friend's phone number in your contacts under a name you desire. Because people remember words better than numbers, domain names were created in the first place.
Thankfully, DNS exists to help us remember the IP address of each domain. If the browser does not recognize the domain name (it is not in its cache), it will query the Domain Name System for the IP address associated with that domain name.
The DNS request is sent through the resolver first. If the resolver fails to find the IP in its cache, it will contact the root server. The root server knows the location of the TLD (Top-Level Domain) server. In our example, the top-level domain is .com
. If the TLD server does not know the IP, it directs the resolver to the Authoritative Name Servers for the domain name.
Usually, many name servers are associated with a single domain name. However, any of those name servers can provide the IP address for the domain name to which they are associated. Now that the resolver knows the IP address (for example, 35.162.7.193), it can transmit it back to the browser, which will make the request to the appropriate server.
TCP/IP connection
We discussed how domain names actually represent IP addresses, however IP is not the only form of protocol used by the Internet. The Internet Protocol Suite is also known as TCP/IP (TCP stands for Transmission Control Protocol), and it includes different sorts of protocols. It is a set of rules that specify how servers and clients interact over the network, as well as how data should be transported, broken into packets, received, and so on:
To create a connection, the browser makes a request to the server through IP.
The server gets the request and responds with a message of acknowledgement in order to create a connection. This is how the handshake works. Learn more about the handshake here.
Once the handshake is complete, the browser can submit a request to the webpage it wishes to access (in this case, Google.com's homepage). This request is transmitted using TCP, which guarantees that it is properly and in the right order.
The server receives the request and returns to the browser the HTML code for Google.com's homepage. To ensure reliable delivery, this response is likewise transmitted via TCP.
The HTML code is received by the browser and used to render the content on your screen. TCP/IP is also used to request and receive any resources (such as photos) that the webpage requires.
Firewall
Servers are frequently equipped with firewalls to defend themselves against hackers and attacks. A firewall is software that controls what may and cannot enter or leave a network. In our example, when the browser requests the website at the address 35.162.7.193, the request is evaluated by a firewall, which determines if it is safe or a threat to the server's security. The browser can also be equipped with a firewall to determine whether the IP address returned by the DNS request is a potential malicious agent.
HTTPS/SSL (Security & Encryption)
The browser will handle the other portion of the URL, the https://
component, now that it has the IP address. HTTPS, which stands for Hypertext Transfer Protocol Secure, is a secure version of HTTP. It is the primary method of data communication between a browser and a website.
HTTPS requests and responses are encrypted, ensuring that users' data cannot be stolen or used by third parties. For example, if we enter our credit card information on a website that utilizes HTTPS, we can be sure that it will not be stored in plain text somewhere accessible to everyone.
The SSL certificate is another important aspect of website security. SSL stands for Secure Sockets Layer. The certificate must be provided by a reputable Certificate Authority, such as the well-known Let's Encrypt, which provides free SSL certificates. When a website has this certificate, we can see a small lock icon next to the website name in the search box. The bar turns green in some browsers and with certain SSL certificates.
Load-balancer
Websites, as previously said, exist on servers. Most websites with high traffic volumes would be hard to host on a single server. Also, it could create a Single Point of Failure (SPOF) because it would only take one attack on one server to bring the entire site down.
As the demand for improved availability and security increased, websites began increasing the number of servers they had, clustering them, and employing load-balancers. A load-balancer is a piece of software that distributes network requests across multiple servers using a load-balancing algorithm. HAProxy is a well-known load-balancer, and examples of algorithms that can be used include round-robin, which distributes requests evenly and consequentially by alternating between all servers.
Web server
Once the requests have been evenly assigned to the servers, one or more web servers will process them. A web server is a piece of software that serves static material such as HTML pages, pictures, or plain text files. Nginx and Apache are two examples of web servers. The web server is in charge of locating the static material matching to the requested address and presenting it as an HTTP or HTTPS response.
Application server
The foundation of any web page is a web server. However, most websites do not desire a static page with no interactivity, and the majority of websites are dynamic. That means you can interact with the site, store information to it, log in and so on.
This is made possible by using one or more application servers. These are software programs that, among other things, operate apps, interface with databases, and handle user data. They operate behind web servers and will be able to provide dynamic applications using static material from the web server.
Database
The Data Base Management System (DBMS) is the final component of our online infrastructure. A database is a collection of data, and the DBMS is the application that will interface with the database to access, add, and alter data in it.
Conclusion
When we type a URL into a browser, all the agents we discussed form a response and serve it to the client in microseconds. Even if we know what is going on behind the scenes, it is still actually amazing to see it before our eyes.
I hope you enjoyed it and understand what occurs when you type "google.com" into your browser.
If you'd want to connect with me directly, please shoot me a DM on Twitter; I'd love to hear from you.
You may also read my blog and follow me on Twitter
Thank you