Do you want to learn about HTTP headers? This page has been written for you. Keep reading the article on this page to discover the major requests and response headers used majorly by developers to communicate certain details to web clients and servers.
Regular web users do not need the knowledge of the HTTP headers in other to make use of the Internet. However, for developers and users of web clients’ other than regular web browsers such as headless automation tools, they need to learn about HTTP headers to develop standard APIs, troubleshoot errors, and use tools developed by others effectively. The Internet is not as simple as Internet browsers make us believe. The Internet is not your browser, and it is much complex than just typing a URL and getting a web page displayed.
There’s a lot that happens behind the hood that isn’t important to regular Internet users, but as a developer or user of automation bots, they are essential knowledge. One of the concepts you need to know is that your requests are sent with additional information – web servers equally send additional information aside from the response body.
This additional information is meant to communicate certain details to either the web client or server in other to know how to treat the request or response. These additional details are known as HTTP headers.
We are going to be learning, looking at it in detail. But before that, let take a look at how we are able to access content on the Internet. This knowledge is important to our understanding of HTTP headers.
How Surfing the Internet Works – The HTTP Client and Server Model
The International Network, otherwise known as the Internet, is a computer network with computers communicating with each other by sending and receiving data. The Internet is much more complex than your Local Area Network (LAN). It provides a medium for computers unwired all around the world to communicate.
There are basically two types of computers on the Internet. These are client computers and web servers. The client computers are the computers you use in browsing the Internet. Browsing is done using web clients (Internet browsers, desktop/mobile application, bots, and many others). The other computers are known as web servers – they are the computer’s websites that are hosted.
At its most basic level, browsing takes place through the client-server communication model. Let take a look at the steps involved in this communication model.
- Let say you want to access Google homepage, and you type https://www.google.com into the address bar of your browsers and submit. Your browser will send a DNS query to find out Google’s IP address.
- With the IP address found, it then opens a connection with Google, using its IP address and creates an encrypted handshake to make eavesdropping useless.
- After the connection has been created, your browser now sends a formal request to Google, requesting for content from its homepage – HTTP request headers are also sent.
- Google’s web server looks at the request, and if there is nothing wrong with it and you have the permission to access its homepage, it now sends back the homepage content to your browser as a response. It does not only send the response to be displayed; it equally sends response headers so that your browser will know how to treat the response.
Looking at the steps above, you can see that web clients in the form of web browsers or bots send HTTP requests to web servers, and web servers process them and return a response. The HTTP requests sent consist of three parts – request line, request headers, and request body. Our focus in this article is on both the request headers and response headers – grouped together as HTTP headers.
What are HTTP Headers?
HTTP headers are additional data passed alongside HTTP requests or responses to communicate other details to either a web server (in the case of a request) or to a web client (in the case of a response). It provides information that will help the computer at the receiving end better understand the content being sent to it and how to treat it.
HTTP headers are not compulsory, but many web servers will forbid your requests if you send requests without some of them as a way of preventing spam and abuse. They are in the form of key-value pairs separated by a colon.
HTTP headers are not case sensitive. This means that if you are developing a REST API point, you can decide to send response headers in either upper case or lower case, and the receiving client should be able to understand. When developing web clients such as any web application that sends and receive data over the Internet, you can also decide to send request headers in either upper case or lower case.
How to Inspect HTTP Headers
To understand what I said up there about web clients (web browsers) sending request headers and web servers sending responses base on headers, I will show you how to inspect the send and received headers. This knowledge will be useful to you if you want to spoof the real headers sent by your web crawlers and bots in other to avoid detection.
Web browsers are transparent with the data they send alongside your requests. There are specialized software you can use to inspect your requests and response to see the headers. However, let make use of the one that comes preinstalled in Google Chrome – the Developers tools.
To inspect HTTP headers using Chrome, make sure you have an Internet connection and launch Chrome. Look out for the developer tools (Menu => More tools => Developers tools) and launch it. An interface like the below will pop up in your browser.
The next step is to send a web request to any valid URL. In this example, we requested for the Google homepage by typing https://www.google.com in the address bar and submitting it. Immediately you submit the request; the empty space above will be populated with the numerous requests send by your browser in other to collect all the resources for it to render Google’s homepage. The screenshot below is what the developer tool will look like after sending the request.
From the above, you can see a good number of requests (51 requests in total). I click on the second one, and you can see it returns the 200 HTTP status code, which means the requests were successful. The header information is located on the right-hand side of the developer tools interface.
There is three header information you can see – General Headers, Response Headers, and Request Headers. If you look at the information in each of the categories, you will see that there is a lot more information sent under the hood by web servers and clients that we do not know of.
Our focus will be on requests and response headers. We will be discussing only a few of them. For a comprehensive list of HTTP headers, read this Mozilla web document on HTTP headers.
HTTP Request Headers
HTTP request headers are headers sent alongside web requests to pass other details to web servers. HTTP request headers are used for content negotiation, passing other request context, and letting a web server know some details about the origin of a request.
Depending on the web browser you are using, a bunch of request headers may be sent or not. The more headers you send, the better the response sent to you will become predictable. Even without sending request headers, your requests will still be attended to.
However, you risk getting generic content returned to you – or your request even getting denied. Below are the most common HTTP request headers being sent by web browsers. As a bot developer, it is important you incorporate them into your requests and mimic details sent by regular browsers so that your requests aren’t detected as bot originating.
The Accept Header is a content negotiation header. It communicates to a web server the type of content a web client can access, read, or render. This gives directives to web servers on the format of data they can return as a response.
If a web server does not have support for the content type accepted, it is best for it not to even send it in the first place. The value depends on the type of content requested. However, there’s a default value for each web client. Take, for instance, the Accept Header for Google Chrome is
This communicates the character encoding the client understands. While this header is important, some browsers do not send them along with requests as they support the popular encoding system. Even Chrome, in many instances, does not send this header.
This header lets web servers know the character encoding algorithm that can be used on the resources to be sent to a particular web client. Google Chrome sends
as the value for Accept-Encoding in the example above when a request was sent to Google, requesting for its homepage.
Accept-Language is meant for letting a web server know the language it should send back the response in. This comes handy on websites that have support for many languages. Without specifying this, you will still get back a response from the server. However, you might get the response delivered in US English instead of in UK English.
The Cache-Control header is used for communicating caching policies to a web server. With this header, you can tell a web server to cache your response or not – you can also give it directive on the maximum time a cache should be kept. For a maximum period of time for keeping cache, max-age is used. Some of the other values for Cache-Control includes No-Cache, No-Store, Public, and Private. You’ll need to learn more about each of these to learn more about them.
The default value for the Connection Header is keep-alive. Even without setting it, that will be the value. The other value for Connection Header is close. The difference between the two is that if the value is set to
, subsequent connections to the same server within that period will use the same TCP connection and, as such, improves response time and reduce CPU load. Except you have a reason to change the value, you do not have to send this when developing your web automation bot.
This header’s value is a cookie. A cookie is a small piece of data sent to a client by a web server in other to remember stateful details and keep records of one’s browsing activities. Cookies are site-specific, and this value is usually empty until, after one visit, a website and the website send it a cookie. Cookies are very important for persistence. When you login, a persistent session cookie is sent to your browser via the
header, and your browser populates the Cookie header. With the cookie set, you do not have to enter your username and password any time you send the same website a request unless the cookie expires or you clear it.
The User-Agent Header is arguably one of the most important HTTP request headers – without it, some web servers will deny you access. User-Agent is a string that web clients use in identifying themselves to a website.
How a web server knows you’re using either a Firefox, Chrome, Edge browser, Safari, or even the Opera browser is through the value each of the browsers passes as value for the User-Agent Header. User-Agent helps to communicate such details as software name, application type, vendor, and version.
To prevent abuse from crawlers, scrapers, and other automation bots, many web servers only attend to requests with a popular User-Agent.
For this reason, many bot developers tend to copy the User-Agent string of a popular browser such as Chrome and use it as the User-Agent of their bots. This practice is somehow discouraged as web server administrators cannot identify your bot if it has an adverse effect on their servers. Nevertheless, bot developers do not see this as a problem on their end.
HTTP Response Headers
Just as web clients send additional information as requests headers to web servers, web servers equally do so when sending back a response. The additional information sends aside from the response line is known as HTTP response headers.
They give directives to web clients on how to treat the response and pass other information that will make subsequent requests easier and faster. As a web developer, it is important you know about these headers and the correct value for them.
Developers that consume REST APIs should equally know about them, as this will help them build more compliant systems. Let take a look at the popular response headers.
Cache-Control header appears twice – both web clients and servers send them. For web servers, they send this header to give directives to web clients (browsers) on caching. For time-sensitive data and other information that changes quickly, web servers will send the
as the Cache-Control value. This is telling the browser not to cache the content of the response. Max-age is used for setting the period of time a cache can be kept – it is measured in seconds. Other values for Cache-Control includes no-cache, must-revalidate, and public, among others.
This particular header expresses the length of the response body to web clients in 8-bit bytes. This is useful for clients that accept response within a certain content length range.
The MIME type of this content is the value for this header. The Content-Type header reveals the format and type of content sent as a response. It tells a web browser whether the response is an HTML file, a PDF, or even an audio file. It also reveals the character encoding.
The Date header is pretty straightforward. This header is used to communicate the time and date stamp as at the time the response was sent. The time and date used here is that of the server and not the client. If you are developing a web bot or any other application that consumes a REST endpoint, this might not be of use to use. It only becomes important if you want to keep track of when a particular was retrieved.
The Expires Header complements the Date Header. It tells a web client when a response is considered stale. Just like the Date Header, it comes with both time and date.
This header communicates to you, the server that is sending you the response.
From the HTTP request header section, we made mention of the Cookies header and how it is usually first on your first attempt at visiting a website. The Set-Cookie header provides web clients the cookies to set for its subsequent requests.
A Note for Bot Developers
HTTP headers are important to bot developers. Whether you are developing a web crawler, scraper, site tester, or even a purchase bot, as in the case of sneaker bots, you need to know the popular HTTP headers. This is because, unlike web browsers that send correct HTTP requests headers and use HTTP response headers returned to render response contents correctly without the user interference, all header settings are left for you.
Most of the HTTP requests modules used for developing bots are identifiable by the default headers they send, most notably, the User-Agent string.
For this reason, unless you use custom headers, the chances of your bot getting detected are high. Setting cookies for persistent login isn’t also automatic as it is in a web browser.
There are a whole lot of things that browsers do for you automatically that you need to take care of on your if you are developing a web bot – and you need to interact with HTTP headers to achieve some.
More HTTP proxy Guides,
- What is an HTTP Proxy? Types of HTTP Proxies Explained!
- The Most Common HTTP Proxy Error Codes
- What is the Difference Between HTTP and SOCKS Proxies?
It is important I stress here that the HTTP headers discussed above are just the common ones. There are a good number of them, with many in the beta stage – still, work in progress.
If you want to check out a comprehensive list of all the HTTP headers, you can check out the HTTP header documentation on the Mozilla website.