cft

How I studied your programming work by recording your online data.

The surprising power of REST API & Web Scraping in user analytics.


user

Garry Tiscovschi

3 years ago | 5 min read

The same evening that I finished building the website, I sent my friend a text: “Your top language is Python and your favourite time for programming work is 6pm”. He decided not to text back, my phone started ringing.

Just a little static web development allowed me to seamlessly download live and accurate data from GitHub on my old school friend(or anybody else, who’s username I had for the matter). The website (http://www.gitwatch.org/) then displayed that data in an easy to read dashboard.

How was it done? With the power of too much nosiness, and REST API. Every day (voluntarily) countless terabytes of fascinating data about people is uploaded publicly online. You can anonymously siphon this information into your own database with just a few blocks of code.

This can be extremely useful for research, commerce, marketing and much more; or like in my case it can really highlight how much better your friend’s sleeping schedule is than your own.

What is Rest API…?

//Skip this section if you already know about REST API

REST stands for REpresentational State Transfer and API stands for Application Programming Interface.

APIs:

APIs are like a control panel or a dinner menu for an application. APIs list operations that developers can use, along with a description of what they do. The developer doesn’t necessarily need to know how. APIs are considered good programming practice and among other benefits (including security) can help keep a system organised and easy to use for developers. Finally, and extremely importantly API’s let applications interact with each other. For example, if you wanted your program to take photos using your Android camera you don’t need to grind out a new OS for yourself, you can simply tap into the existing Android Camera API. APIs let your apps talk it out between themselves. API’s can make your program both more useful to yourself and others. (Hoffman 2018)

Read more about it here: https://www.howtogeek.com/343877/what-is-an-api/

REST API:

When is an API RESTful? After 8hrs of sleep? No, a RESTful API is one that follows a special set of rules. These rules make APIs easier to learn, use and discover for new developers.

As mentioned above, REST stands for REpresentational State Transfer.

“It means when a RESTful API is called, the server will transfer to the client a representation of the state of the requested resource.”(Avraham 2017)

“For example, when a developer calls Instagram API to fetch a specific user (the resource), the API will return the state of that user, including their name, the number of posts that user posted on Instagram so far, how many followers they have, and more.”(Avraham 2017) This data is most commonly returned in the form of an easily readable JSON file.

Read more about REST API here: https://medium.com/extend/what-is-rest-a-simple-explanation-for-beginners-part-1-introduction-b4a072f8740f

How available are REST APIs?

Developers can choose to make parts of their application available to others to make it more useful. Obviously, functionality and data important to security are mostly unavailable but it’s open season for publicly accessible information and functionality. If you have account permissions e.g. a Twitter account you would then have access to the information your user can find, but now on a massive scale.

Not every site has a REST API and some are too lightly supported to be truly useful, but big sites do and there are more than 16,000 APIs for you to access out there already. If an API is not available to you, Web Scraping can be used instead. The web scraping alternative will be discussed briefly below.

Example Application: The Website: gitwatch.org :

Finally, a tool to let you find out how much your co-workers are beating you in terms of coding productivity: Linked here (GitWatch). GitWatch lets users analyse the public behavior of other coders on GitHub and is an example of what can be done for other platforms using REST API.

As of the writing of this blog the dashboard currently yields 3 outputs when you enter a GitHub username. It works for your own username or anybody else’s.

Graph 1: showcases the user’s language usage (in byte size) per language.

Graph 2: (‘Favourite Day’) Shows the percentage of commits (y axis) by day of the week (x-axis).

Graph 3: (‘Finest Hour’) Puts the percentage of commits (y axis) against the hour of the day (x-axis).

Ideas for similar examples:

Want to automatically follow people with similar interests to you on Twitter? Twitter REST API.

Want to download live data on hotels in the area for your travel agency? Hotel website REST API.

Is the amount of messaging and forum posting by a user correlated with the amount of code work outputted? GitHub REST API.

When do super-interesting people post stories of their lunch on Instagram? Can I sort it by country? Instagram REST API.

How to build something similar yourself: using REST API to gather data:

How was it made?

1. First I scanned the documentation for using the GitHub API.

Available here: https://developer.github.com/v3/

2. Next, I tested the commands by entering them into my browser and finding example JSON outputs that I wanted manually.

In this case I put https://api.github.com/users/octocat/repos into my browser. Octocat is the example username given in the GitHub API Documentation. You can insert your own username here.

3. Then, I automated the process in java-script.

The curled brackets let me enter my own input username and access tokens. The access tokens are necessary to let GitHub know that you’re gathering this data. These tokens can be received very quickly and easily on their website. Accessing GitHub data without tokens may lead to limits being placed on how much information you can gather.

4. Just like that you have a list of data objects to manipulate at will. The rest of the processing is up to you!

Note on Web Scraping vs REST API:

When REST API’s aren’t an option Web/Screen Scraping is available instead. Web Scraping is a technique employed to extract large amounts of data directly from web pages by inspecting and copy and pasting information from web elements. There is a lot of discussion surrounding Web Scraping vs REST API with some claiming that Web Scraping is more difficult and sometimes has less functionality.

Outro:

The sheer amount of data REST API and web scraping makes available to you is inspiring and thought provoking. Me, a solo-developer with the java script ability of a coffee mug, in a few embarrassingly sleepless nights managed to accurately discover my friends(and any other GitHub user’s) work patterns. Imagine, what a team or company of competent coders with access to much more data can and are already doing to measure and study our productivity as programmers.

Thanks for reading! Have any Rest API stories, ideas or hints of your own? Share them below!

If you want to discuss tech/history or get updates on GitWatch, connect with me on twitter: @GTiscovschi

References:

Hoffman C, 2018, What Is an API, How To Geek, https://www.howtogeek.com/343877/what-is-an-api , [Last accessed 28/03/2020]

Avraham S.B, 2017, What is REST — A Simple Explanation for Beginners, Part 1: Introduction, Extended, Medium, https://medium.com/extend/what-is-rest-a-simple-explanation-for-beginners-part-1-introduction-b4a072f8740f [Last accessed 28/03/2020]

Upvote


user
Created by

Garry Tiscovschi


people
Post

Upvote

Downvote

Comment

Bookmark

Share


Related Articles