Jupyter Notebook is a very popular way to run Python and is completely free software. Jupyter boasts “support of over 40 programming languages, including Python, R, Julia, and Scala. Interactive output with code that can produce rich, interactive output: HTML, images, videos, LaTeX, and custom MIME types. Big data integration Leverage big data tools, such as Apache Spark, from Python, R, and Scala. Explore that same data with pandas, scikit-learn, ggplot2, and TensorFlow.” Jupyter’s Official Website
Although Jupyter has such incredible capabilities we will only be going over some of the basics.

Topics covered:

  1. How to open a new notebook
  2. Popular libraries to import
  3. A Quick Web Scrape

We won’t cover the actual install of Jupyter Notebook because there as so many ways to do it depending on your needs and Operating System. If you haven’t yet installed Jupyter, you can check out this short tutorial on youtube.
How to Install Jupyter Notebook

How to open a new notebook

Now that you have Jupyter Notebook installed, let’s look at how to open a fresh notebook. Depending on how/where you are using Jupyter Notebook, this next step may be different for you. In the case that you installed Jupyter Notebooks in a Linux based system (on a Mac, WSL, or others), to open it, type the following into your terminal:
image

This should start up a new kernal and redirect you to a newly created webpage. If the notebook was started in an empty directory, it will look something like this:

image

Head over to the right side and click new, then Python 3.

image

You should now be looking at the start of your new, unnamed, empty notebook. You’re ready to go!

image

Numpy is essential if you’re going to be doing any sort of computation. In the first line of your new notebook, simply put in import numpy as np. Follow the link for a quick view at arithmetic now available with Numpy. Numpy Basics

Another essential library is Pandas. Pandas is used primarily for the manipulation of data structures. Import it with import pandas as pd. Here’s a quick tutorial on Pandas. Pandas Basics

A Quick Web Scrape

Now lets get to the good stuff - scraping the web. Even within Jupyter Notebooks there are many ways to do this. We will be using Beautiful Soup. First, a few import statements. For this quick tutorial we will need the following:

from bs4 import BeautifulSoup
import requests
import pandas as pd

Now we need to make the “Soup”. We will be doing this demonstration on a simple table found in a wiki page. Let’s look at some code.

url = "https://en.wikipedia.org/wiki/Production_car_speed_record" Wiki Page
data = requests.get(url).text
soup = BeautifulSoup(data, 'html.parser')

Now let’s take a look at the tables available to use in the soup. We are soon going to want information about the classes of each table, so let’s print it out. Here is the code along with the results.
image

We are going to pull in the first wikitable shown on the wiki page - the Record-Breaking Production Vehicles.
table = soup.find('table', class_='wikitable sortable')

Let’s create the columns for our dataframe with the same names as the columns in the table.
df = pd.DataFrame(columns=['Year', 'Make and Model', 'Top Speed', 'Engine', 'Number Built', 'Comment'])

Next we need to go through each row in the table, pull out each value and put it in the correct column.

for row in table.tbody.find_all('tr'):
    columns = row.find_all('td')

    if(columns != []):
        Year = columns[0].text.strip()
        Make_and_Model = columns[1].text.strip()
        Top_Speed = columns[2].text.strip()
        Engine = columns[3].text.strip()
        Number_Built = columns[4].text.strip()
        Comment = columns[5].text.strip()
        df = df.append({'Year': Year,  'Make and Model': Make_and_Model, 'Top Speed': Top_Speed, 'Engine': Engine, 'Number Built': Number_Built, 'Comment': Comment}, ignore_index=True)

That’s it! Congratulations! You just successfully pulled a table off the internet into the Pandas Dataframe ‘df’. Try it with another table, or start doing some computations with this DataFrame!