How to Scrape B&H for laptop price and spec via Python Scrapy and import the data into SQLite (Free)?

Informula
3 min readDec 3, 2020

Purpose

Search and grab the laptop information in the B&H website via Scrapy and store the data into the SQLite so that we can easily play around with the data. See the full version LINK.

Tool

  • Python package: scrapy/ json/ sqlite3
  • SQLite

Target Website

Steps

  1. Let’s set up the Scrapy environment first.
  • Go to command prompt and type “scrapy startproject bandh” where “bandh” is the folder name created for this project. After that, you will find a new folder called “bandh” created with a couple of py files.
  • In the folder of “spiders”, create a py file to start our Scrapy scripts. We can name it “crawler.py”.

2. Go through the structure of the website to locate where the target elements are.

  • When you inspect the website, you can pick price of an example and look it up across the source codes.
  • You will find it located in the tag of <script type=”application/ld+json”>.

3. Py scripts in the file of “crawler.py

  • For the basic function and structure of Scrapy, you can refer to their official website. We will mainly focus on the scripts in the method parse().
  • Store the content into the variable json_data and return the required elements one by one. Please note that we start with y = 1 (0+1) instead of y =0 because we found that the first element of the <script type=”application/ld+json”> does not store any product information.
  • We found that some product does not store price in the <script type=”application/ld+json”>, which will return error. To solve that, we will use try and except to avoid break of process.
  • Go to command prompt under your project folder and type “scrapy crawl bandh_scrapy -o bandh_scrapy.json -t json”, which will run the scripts and store the data in the json format.

4. Import json file into the table “table_bandh” in SQLite DB ”web_scrapy

Thank you! Enjoy it :)

--

--