Curl 101: What It Is & How to Use Curl for Web Scraping

Are you looking forward to learning about the cURL tool and what you can use it for? Then come now and read our gentle introduction to the cURL tool and how you can use it for testing endpoints and even web scraping.

cURL tool

In the past, you cannot use a computer if you do not know how to use the command line, and most operations are carried using it.

This has changed, and most computer users nowadays do not know the basic use of the command line as everything has been abstracted and hidden behind easy to understand and use beautiful UIs – you necessarily do not need the knowledge of the command line again – but this is only true if you are not a coder or your work does not require you to integrate with the command line. One popular tool available as a Command-Line Interface (CLI) application is the cURL application.

In this article, we would be taking a look at the cURL tool and showing you how you can make use of it. One thing you need to know is that the tool is quite powerful, and there is a lot you can use it for. Before going into the how-to, let take a look at a brief overview of the cURL application.


What is cURL?

YouTube video

The curl is a command-line tool that has been developed for transferring data to and from web servers. In simple terms, the curl application is a CLI application that facilitates network communication with web servers. With this tool, you can request data over from web servers, and with the same tool, you can provide web server data from your own end.

Because it is a CLI application, you will have to deal with using the tool from the command line – the usual white on the black background text interface. While this might be intimidating at first, it might interest you to know that once you get used to it – you discover how flexible and powerful it is.

The curl stands for client URL. As a tool that helps you talk to a server in a remote location, you will need to specify the location, which usually would be in the form of a URL, and that is where the URL comes from after the “client” so basically, it is a client application for communicating with servers. In terms of protocol support, the curl application has got support for popular protocols, including HTTP, HTTPS, FTP, FILT, DICT, POP3, IMAPS, and SMB. The curl is being powered by the liburl, which is a free and client-side URL transfer library.


Why Use Curl?

Use Curl

Are you wondering why you should use the curl command-line application instead of one of its competitors? It turns out the tool has a good number of pros which include the below.

  • Cross-Platform and Portable

If you ask me, I will tell you one of the reasons I like this tool is that it is available on the most popular operating system. Yes, it is not the kind of tool you will need a virtual machine for in other to use on some operating systems. Interesting, it comes preinstalled in most operating systems. If it is not, all you need is a few steps to get it installed. All you need for this tool is network connectivity and a command line.

  • Easy to Use

Just like most command-line applications, the tool might look intimidating at first, but if you get along with it, you will discover it is quite easy to use. All you need is to go through the documentation and constantly practice how to use it, and you are good to go. There are also a good number of tutorials you can follow online to master the act of using the curl.

You do not consider this a reason to use curl, don’t you? Well, except you do not want to make use of a tool that has gone mainstream, then learning how to use curl would be added advantage to you. This is because many web services do have support for curl, and as such, knowing how to use it would get you better at using some of these services and tools that support it.

  • Provides you Lots of Details

One thing you will come to like about the curl application is that it is verbose. It provides you the information you need about what you send and receive. It is also excellent for debugging because of its error logging support which is quite good and makes it easy for you to spot errors and quirks in your tools.


What Can You Do with Curl?

The basic use of the curl application is to communicate with web servers. What you do with that is up to you, and as you will find out, there are many things you can do with the curl tool once you master how to make use of it.

It might interest you to know that you are probably using curl without even realizing it because of the thousands of devices, including IoT devices, that make use of it. Let take a look at 3 popular things you can do with curl.

  • Testing APIs

Testing APIs

Testing API endpoints is arguably the number one reason curl is available. With this tool, you can send API requests and get a response. This makes it a powerful tool for testing endpoints if you are a developer. It supports authentication, allows you to send both GET and POST requests as well as PUT and DELETE.


  • Web Scraping

Web Scraping with curl

Web scraping is basically the act of collecting data in an automated manner from websites. Because curl can send and receive data from web servers, you can use it to collect data from websites. It might interest you to know that there are some websites that offer data to curl.

Take, for instance; you can use curl to query the weather using the wego application. You can use curl to check when a website is down, check for price change, and even download files – with resume compatibility.


  • Manage Curl compatible Applications

Curl compatible Applications

There are some tools and services that have support for the curl. I know a good number of web scraping APIs and proxy services that have support for curl. One thing you will also come to like about the tool is that there are some social media that support it.

Take, for instance, Twitter and Facebook provide curl clients, which you can use to interact with your account and post content. One thing you need to know is that curl does not do this alone – it needs to work with tools provided by these web services.


How to Install the Curl Application

Install the Curl Application

This section of the article has been created for just a few of the readers that do not have curl installed on their computer. This is because the tool comes installed on a good number of Operating Systems.

If you are using a macOS, then you already have it installed on your computer by default. For Windows users, the tool comes preinstalled in Windows 10, version 1803, or later.

To see whether you have curl installed, open the console application (command prompt) and type

curl

, then press the Enter button. If it is installed, the console will print the below.

curl: try ‘curl –help’ or ‘curl –manual’ for more information.

Else, you will see something like

curl command not found

– if the curl package is not installed. If not installed, follow the steps below to install it on your computer.

download page of the curl application

  • Unzip the downloaded zip file and place the curl.exe file in the C:\curl
  • Go to https://curl.haxx.se/docs/caextract.html and download the digital certificate file named cacert.pem. move this file to the C:\curl and remain it curl-ca-bundle.crt

CA Extract

  • Add the curl folder path to your Windows PATH environment variable so that the curl command would be available globally on the command prompt. To do this, read this article to learn how to get that done.

Idojo Homepage

  • If you have done the above correctly, you have successfully installed curl on your Windows PC. To test this out, open the command prompt and type curl, then press the enter button and you should see the
    curl: try ‘curl –help’ or ‘curl –manual’ for more information

    printed on the screen.

YouTube video

How to Use Curl for Web Scraping

Curl for Web Scraping

In this section, we would be taking a look at how to use the curl application for web scraping. With just a line of curl command, you can get the content of a full web page. Below is the basic syntax of curl.

curl [option] [url]

The curl is to tell the command prompt that you want to use the curl application, the url is to specify the location (URL) of the remote content you want to interact with, and the options section is self-explanatory – to specify some of the options available for the curl application.

Take, for instance, let look at the simplest curl command you can use.

curl www.example.com

The above command, when you press the enter key, would print the content of the page on the www.example.com page – you can use any other URL, and you will get the HTML – not just what is visible to you but what the full content (HTML).

The above is the first step to web scraping, which is downloading the content of a page. At this point, it prints it to the console, and as such, you can use it – you will need to parse the required data out, which curl will not do for you. Let try another thing – saving the content of a URL in a file – doing more than just printing it on the console.

curl –o filename.html www.example.com/file.html

As you can see from the above, we have made use of the option tag the –o to show we need to download the content of the page. The filename will be the name of the file. One thing you also need to understand is that we did not specify the protocol, and as such, the tool used HTTP by default.

Let specify the code for you to see how it is done should you need to interact with other protocols aside from the HTTP protocol in your tasks.

curl –o filename.html http://www.example.com/file.html

At this point, this is still not web scraping in full as you are not parsing out any required data. You will need to use it in conjunction with other tools to get a full web scraping experience. Below is an example PHP plus curl for scraping the web.

<?php

/**

 * @param string $url - the url you wish to fetch.

* @return string - the raw html response.

*/

functionweb_scrape($url) {

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$response = curl_exec($ch); curl_close($ch);

return$response;

}

/**

 *

* @param string $url - the url you wish to fetch.

* @return array - the http headers returned

*/

functionfetch_headers($url) {

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_HEADER, 1);

$response = curl_exec($ch);

curl_close($ch);

return$response;

}

//var_dump(get_headers("https://www.google.se/"));

// echo web_scrape('https://www.google.se/');

The code above was gotten from GitHub and showed you how to develop a simple web scraper using the duo of PHP and curl.

Conclusion

The above is just a proof of concept. There is a lot you need to learn to know how to effectively use the curl application and its commands to interact with servers on the Internet. While there are many guides online that you can utilize, I will point you to the documentation page. Click here to read the official curl documentation.

One thing you need to know is that the tool might not look like it is easy to use at first, but when you use it for a while, you will not only discover how easy it is – but you will also discover how powerful it is in a world where beautiful user interfaces have aggressively taken over.


Read more,