The Barrier to Entry in Data Science
A Beacon of Comfort to the Weary Aspiring Data Scientist
Getting started in data science, or any sub-field of tech in general, is enormously frustrating. The barrier to entry is often sufficient as a gatekeeper in itself, and by the barrier to entry, I mean:
- having access to a computer with sufficient speed and power on which to effectively learn
- non-existent conceptual awareness of things like terminal, bash, unix, IDE, ipython notebooks, kernels, dependencies, libraries, frameworks, packages, imports, scripts, editors, shells, virtual environments, Docker containers, and that only touches the surface of the arcane vocabulary of the general abstractions of tech
- the sheer scope and intimidating mystique surrounding the culture of tech, by which I most often mean arrogance and a blazing lack of traditional teaching, or empathetic pedagogy
The most common half-joke which describes the experience of learning in tech is:
“Learn how to Google.” -Anonymous, with a smug smirk
While this descriptively true, as in it captures the feeling of learning from the perspective of someone with lots of prior tech experience, it fails to reflect how learning actually happens for a beginner.
While some folks with a prior penchant for research and Googling may be comfortable with the process of learning-through-searching, I’ve found that most often Googling is incomprehensibly harder than it seems.
Ever time I hear this suggestion, ‘just Google it’, whenever confronting a bug or a problem, I seethe with mild existential fury at the smugness which accompanies it 95% of the time, because such a suggestion fails to acknowledge the necessity of understanding a problem in order to even Google it,
This is a very subtle concept to convey.
Why This is a Very Subtle Concept to Convey
It is the mark of a good teacher to understand the conceptual environment of the learner. In order to understand the conceptual environment of the learner, a teacher must remember or simulate what it must be like to learn, as a beginner does, in ignorance of what they don’t know.
The hard sciences have a pedagogical advantage, despite their complexity, because their objects of inquiry are already pre-defined through human beings’ tendency to see objects as separate based on their physical properties and to infer relevant abstractions between them.
- One organism is different from another because they are separate in space, surrounded by different, visible bundles of skin containing organs (Biology)
- Two species are different by certain variations in how they look (phenotype), at least until DNA was discovered, which offered up new invisible properties of distinction through genotype (Genetics)
- Laws of motion were inferred from watching rocks fall and arrows fly and obsessing over circles and divine Aether (Aristotelian physics), until Keplerian ovals witnessed through telescopes implied gravitational force, leading to Newtonian physics in a classic scientific revolution (Physics)
- One pile containing four apples contains two less apples than a pile containing six apples. A child learning these abstractions known as numbers happens through observation (the senses), coupled with a natural human tendency for abstraction in language (Mathematics)
What is happening in tech?
There is a tendency to view tech and “computer science” as, well, a science. That may have to do something with the fact that the word ‘science’ is included in the phrase ‘computer science’. As a result, the protective aura of STEM embalms the environment and practitioners of tech.
But in my experience, learning a programming language is a lot more like studying comparative literature.
Why Learning a Programming Language is a Lot More Like Studying Comparative Literature
For starters, take a look at dependency hell right off Wikipedia.
Dependency hell is a colloquial term for the frustration of some software users who have installed software packages which have dependencies on specific versions of other software packages.
Amazing. Software has its own named problem for the complexities of a system designed by humans, not for humans.
Additionally, let’s take a look at the phrase ‘programming language’.
It’s called a programming language. The proper metaphor of inquiry for making sense of the world of tech has been lurking in plain sight, but the High Augury of STEM has shielded it from being associated with anything so viscerally surface-level different than linguistics, language, and literature.
Things, finally, begin to make a lot more sense when we view the frustrations of aspiring data scientists through the lens of an ancient explorer coming across the Rosetta Stone for the first time, an explorer who can neither speak nor read Ancient Greek or Ancient Egyptian,
and so trivially obviously is incapable of Googling what he cannot understand, especially if the independent problem he is trying to solve is in Greek, Egyptian or, say, Python.
Many tutorials are obviously designed around teaching such fundamentals, yet perhaps 1 in 20 tutorials doesn’t begin with a disclaimer saying you’re gonna have a bad time if you’re unfamiliar with [X language, framework, package, library, etc.]
The culture surrounding the pedagogy of tech and data science isn’t anyone’s fault, but rather the reflection of the brutally chaotic history of programming itself, on top of the fact that computers are not intuitive objects of human inquiry.
They are the bizarre mess of accumulated lower level languages compiled, transpiled, deprecated, compiled again, upgraded, lost, bugged, debugged, and then made obsolete just as things were looking good.
The ultimate point of this post is not to suggest quixotic change, because it’s not going to happen. It’s merely to offer a spot of warmth, a small lantern of welcome in the icy desert of learning.