CSS Selector Cheat Sheet for Web Scraping in Python

Are you a web developer looking to get into web scraping in Python? You will not find transversing and picking data points from HTML documents difficult after reading our CSS cheat sheet for web scraping using Python scraping libraries.

CSS Selector Cheat Sheet for Web Scraping in Python

Web scraping involves 3 main aspects that if you get right, you will be successful at it. This includes sending web requests, avoiding getting, blocked, and parsing out required data points from the full page downloaded or rendered in the case of Ajaxified pages.

The focus of this article is on extracting data from web pages. Every aspect has been made easy for Python programmers including the process of transversing and pulling out required data from web pages. We would provide you with a CSS selector cheat sheet which would make it easy for you to remember when extracting content.

We are aware that Scrapy, Beautifulsoup, and the parsers in the market provide APIs you can use to scrape without even knowing how to use CSS selectors. However, if you are coming from a web background or you want an easier way to access data points from well-structured web pages, then using CSS selectors is the way to go.

It might also interest you to know that many of the web scraping libraries for Python for extracting data from HTML content do have support for CSS selectors including Scrapy and Beautifulsoup.


What are CSS Selectors?

CSS Selectors

CSS is the acronym for Cascading Style Sheets which is the style sheet language used for  specifying how HTML documents should be rendered and presented in a web browser environment. In web development, CSS selectors are used to selecting HTML element(s) you wish to style. In web scraping, CSS selectors have different usage – they are used for selecting elements that wrap the required data points on a page.

Even if you do not use CSS selectors, the library you are using is most likely using CSS selectors under the hood. It is important to state here that CSS selectors are effective in extracting data from well-structured HTML documents. If a page is very messy or the content you are looking to scrap is buried within text, CSS selectors might not be able to help you.


Table for CSS Selectors, Examples, and Description

When it comes to selecting HTML elements for the sake of scraping their content, not all of the CSS selectors are useful – or to be rightfully put it, not all of them are used in day-to-day web scraping tasks. We have been carrying out web scraping for a while now and know the CSS selectors we use the most and have also asked other web scrapers and looked into the code of others that use CSS selectors.

With this, we compiled a list of CSS selectors used mostly by coding web scrapers and present them to you as a cheat sheet. Below is a table that shows the list of popular CSS selectors, their usage, and example.

Selector Example Description
All Element (*) * For selecting all of the elements of a document
Element (.element) h1 Element tag, used for picking a specific element such as h1
ID (#id) #shop_cart For selecting an element with a specific ID
Class (.class) .visited_links This selector will select all of the items that have the specified class
Element with class (element.class) a.visited_links This selector is used to select a specific select with its tag and class, making it more precise than just using class in the case where you only need element that meet certain tag and class
Element(s) within Element (parentElement > childElement) tr > td This selector will select all of the elements within the parent element specified
Element attribute [element attribute] [href] If you use this selector, all of the elements that have the attribute you specified would be selected

the above table might look short but it contains the popular CSS selectors you would use when scraping the web. For most, the class and ID selectors are even OK.


How to Use CSS Selectors for Extracting Data from Web Pages

YouTube video

The popular data extractors in Python for web scraping do have support for using CSS selectors. In this section of the article, we would be taking a look at how to make use of CSS selectors in some of this.


How to Use CSS Selector for Scrapy

YouTube video

Scrapy is an open-source crawling framework that has been developed to make web data extraction easy. It is a complete scraping framework that takes care of both sending web requests as well as parsing data points out of the downloaded web pages. Below is an example code that shows how to use CSS collector in Scrapy. The code is just for the parsing method.

def parse(self, response):

    for wines in response.css('div.txt-wrap'):

        yield {

            'name': wines.css('a::text').get(),

            'price': wines.css('strong.price::text').get().replace('$ ', ''),

            'link': wines.css('a').attrib['href'],

        }

How to Use CSS Selector for Beautifulsoup

YouTube video

Scrapy is usually used for complex projects, for simple projects that are not too complex, Requests and Beautifulsoup are used. Beautifulsoup makes data extraction from web pages easy – it abstracts the complexity behind the hood, providing you a simple API for querying the content of a page. This library also has support for using CSS selectors to pull data from web pages. Below is a sample code showing you how to use CSS selector for the library.

def collect_name(response):

    soup = BeautifulSoup(response, "html.parser")

    name = soup.select("#name")

    return name

Conclusion 

Cheatsheets are usually provided to make your life easier when searching for references on how to get things done. While the cheat sheet for CSS Selectors can belong in terms of web development, you do not need a long cheat sheet for web scraping as there are only a few selectors you would use day to day. For the selectors that you will need once in a while, you can check bigger reference sheets but the above are the popular CSS selectors you would make use of as a web scraper.


Popular Proxy Resources