Lenny's Quest

Note: I originally wrote this article back in the like 2019. At the time I was interested in showing people how much Google actually tracked a person. Unfortunately, my drive died pretty shortly after this article.


Last week I was looking through all my personal data I could export with Google Takeout. One of the things I noticed was they had the option to export your Google Pay data. For the uninitiated, Google Pay lets you use your smartphone as a Pay-Wave credit card replacement. This makes it more convenient for you to spend your hard earned dollars on those things you really don't need. Unable to contain my curiosity about my spending habits I promptly downloaded the history.

Now, for some reason Google has decided the only format you are allowed to download Google Pay data in is… HTML… 'Cuz, you know, HTML is best for large data sets /s. Anyway, I opened up the HTML page and the very first thing I noticed was that each purchase had a link attached to it that pointed to Google Maps! It was at this moment I knew I wanted to make a heat-map of my purchases.


Let's get started! First things first, download your payments from Google. Go to https://takeout.google.com. Once there click “Manage Archives” and then “Create New Archive”. For our purposes we are only interested in your Google Pay data. So make sure all options are unticked except for Google Pay. Click “Next” then click “Create Archive”. When you're archive is ready you will receive an email.

When it's ready, download your archive and get the file in the zip file located at Takeout > Google Pay > My Activity > My Activity.html. This is the file we will be scraping the information from. Save it to an empty folder somewhere. We'll use this folder for both the code and storing the input and output. Go ahead and look through the HTML file to get a feel for the layout.

The first thing I wanted to do was strip the out the following items from each purchase and save it to a CSV file: amount, date, time, latitude and longitude. To begin with I inspected the HTML file with Firefox. I found that each payment entry was surrounded by a div with a class called “mdl-grid”.

From there I was able to work out the div for date and time, price, and latitude and longitude. Now, on to the Python!

For this project I'm using Python 3.6, but I feel any Python 3 version should work (don't quote me). First thing we need to do is install beautiful soup. If you don't know what it is, Beautiful Soup is a super handy tool for looking through HTML files (online or offline). It makes it really easy to search for an element based on type, class or ID. If you want to use a virtual environment go ahead and activate it now. To install beautiful soup do:

pip install bs4

Now make a new file called scrapePayments.py. Bellow is the code I used to scrape the values to a CSV file:

import bs4, sys, csv

LOCAL_CURRENCY_SYMBOL = '$'

def main():
    
    # Import html file to beautiful soup
    with open(sys.argv[1], 'r', encoding='utf8') as file:
        html = bs4.BeautifulSoup(file, "html.parser")

    # find tags that have payment details
    payment_html = html.select('.mdl-grid')
 

    # '.mdl-grid' also grabs the entire page as it the whole 
    # webpage is wrapped in a 'mdl-grid' div. This just removes it

    payments = []
    # This is the part that grabs the data out of each element.
    for payment in payment_html:
        print("============================================")
        print(payment)
        try:
            payment_details = extract_purchase_details(payment)
            payments.append(payment_details)
        except:
            # This is a hacky way to remove elements that don't fit properly.
            # If an payment element throws an exception then instead of handling it we just ignore it.
            # This helps remove "payments" that are things like special promotions and such
            pass

    write_to_csv(payments)

def extract_purchase_details(payment):
    # This is responsible for extracting the payment details
    date, time = (payment
                    .select('.mdl-typography--body-1')[0]
                    .get_text()[57:]
                    .split(',')
    )
    time = time.strip() # This removes the whitespace infront of time    

    price = (payment.select('.mdl-typography--caption')[0]
                        .get_text()
                        .split("\u2003")[3]
                        .split(LOCAL_CURRENCY_SYMBOL)[1][:5]
                        .replace('W', '')
    )
    
    maps_url = (payment
            .select('.mdl-typography--caption')[0]
            .find('a')['href']
    )
    query_index = maps_url.find("query=")
    lat_and_lon = maps_url[query_index+6:]
    lat, lon = lat_and_lon.split(',')
    lat = lat.strip()
    lon = lon.strip()
    
    return (price, date, time, lat, lon)

def write_to_csv(payments):
    # Writes the values of payments to a csv file.
    with open('output.csv', 'w', newline='') as f:
        writer = csv.writer(f)
        # Write the headers for the CSV file.
        writer.writerow(['amount', 'date', 'time', 'lat', 'lon'])
        # Write the array of tuples to the CSV file
        writer.writerows(payments)


if __name__ == "__main__":
    main()

To begin with, this code reads the HTML file into a Beautiful Soup object. Then from that object we select the items that relate to payment (every object that has the class 'mdl-grid'). The way we've chosen to select the items gives us an extra element we don't want, so we just pop it off the list.

After that we go through each payment and feed each one through our 'extractpurchasedetails' function. This function will go through and extract all the information from the HTML elements. The date and time, for example, is pulled from the text element with a class 'mdl-typography - body-1'. Now there is actually two elements with this class, so it actually returns a list with two elements. We just take the first one. After that we use a string slice to remove the excess text from the values, this leave us with text like the following:

Attempted contactless payment<br>8 Jan 2019, 20:47:30 AEDT

To remove the excess we use a string slice which removes everything except the date and time. Then we split the string at the comma and save the date to the 'date' variable and the time to the 'time' variable. The rest of the values are done in a similar method. Once we have all the values the function returns the values as a tuple which we store in 'payment_details'. Finally we append this tuple to our payments list.

This is all wrapped in a try-catch statement. This is a little bit of a cheat to get rid of the entries that don't actually have purchases in them (things like promotions). Because they don't have the same layout we will cause an exception when trying to access elements of the purchase that don't exist. Instead of handling the exception, we're just ignoring it (just like how I handle all my real life problems!)

Once all values are scraped, we then use Python's CSV library to write the values to a CSV file. To run this, do the following command:

python scrapePayments.py 'My Activity.html'

Now you have all your purchases in a nice CSV file. Depending on your currency you may need to adjust line 3 where we define the local currency symbol '$':

LOCAL_CURRENCY_SYMBOL = '$'
# Might need to become
LOCAL_CURRENCY_SYMBOL ='£'

Funny story, before I added that line to split on the dollar sign, I found out that last time I went to the casino someone charged me 18 Indonesian —Rupees— Rupiah*! 😲 (I live in Australia, that's like $0.0018 AUD at the current exchange rate!) *Thanks to @yogasukmap for pointing out that they use Rupiah, not Rupees in Indonesia!


Alright, time to heat-map! To be perfectly honest, for this part I followed a guide written by a guy called Mike Cunha over on his website. The blog post can be found –> here <–.

EDIT: Update from 2023 again, I just checked this page and the certificate has expired. Here's a link from the wayback machine, the formatting is missing but you can still get the info.

We don't need to follow his method exactly as he adds a boundary to his map. So for our purposes you'll need to install only the following python modules:

pip install folium pandas

Pandas is an awesome library for handling data. It is heavily used in the computer science community. Folium is a Python interface for the map building JavaScript library called Leaflet.js.

So, make a new file called 'mapgen.py' and input the following code:

import pandas as pd
import folium
import os
import sys
from folium.plugins import HeatMap

def main():
    # Read map data from CSV
    with open(sys.argv[1], 'r', encoding='utf8') as file:
        map_data = pd.read_csv(file)

    # Find highest purchase amount
    max_amount = float(map_data['amount'].max())

    # Makes a new map centered on the given location and zoom
    startingLocation = [-19.2, 146.8] # EDIT THIS WITH YOUR CITIES GPS COORDINATES!
    hmap = folium.Map(location=startingLocation, zoom_start=7)

    # Creates a heatmap element
    hm_wide = HeatMap( list(zip(map_data.lat.values, map_data.lon.values, map_data.amount.values)),
                        min_opacity=0.2,
                        max_val=max_amount,
                        radius=17, blur=15,
                        max_zoom=1)

    # Adds the heatmap element to the map
    hmap.add_child(hm_wide)

    # Saves the map to heatmap.hmtl
    hmap.save(os.path.join('.', 'heatmap.html'))

if __name__ == "__main__":
    main()

Ensure that you change the values for the variable in line 16. This is where the map will be focused when you open the generated webpage.

Once you've finished writing the code run:

python mapgen.py output.csv

Upon completion you will have a brand new file called “heatmap.html”. Open up that file and look at all the locations you have thrown your money away!

Unsurprisingly, most of my Google Pay purchases are for my lunch. From the map, can you guess where I have lunch? 😂

A gif showing the heat map in action.


Thanks for reading! If you have any tips or thoughts you'd like to share I'd love to hear them.

— Lenny

Edit 11/01/2019: I got some great feedback by people over on Reddit. So I've implemented some changes they suggested. This includes changing from camelCase to snake_case to be more inline with Python standards. Splitting the code that extracts the payment info into its own function. Finally I broke each of the chained methods I use to extract payment information onto multiple lines to make it easier to follow.

EDIT March 2021: I have updated the scrapePayements.py file as Google has changed the format of the webpage slightly. It should run properly again. However, I have noticed that I don't have any payments with GPS coordinates since the end of 2019. I am unsure if Google still takes GPS coordinates when you pay, or if they just relate it back to their constant tracking of the person.


If you have something to say don't forget to tag me (@wyrm@wyrm.one) so I can respond!

New Blog!

I have created this blog as I have felt for quite some time it would be neat to do. I am still working out the ropes of this platform so bare with me for any blunders I make.

With this blog I hope to be able to share insights I stumble across or cool projects I do, as well as leaving a record that I can look back on in the future and (hopefully) feel proud of what I've accomplished.

I think it would be cool to also use this platform to share some of the stories I have written. My initial thought is to make a separate user for each story and then people can subscribe to the individual stories so they can keep up with chapters as they come out.

— Lenny


If you have something to say don't forget to tag me (@wyrm@wyrm.one) so I can respond!