If you’re looking for a data science job, you’ve probably noticed that the field is hyper-competitive. AI can now even generate code in any language. Below, we’ll explore how AI can extract information from paragraphs to answer questions.

One day, you might be competing against AI — if AutoML isn’t that competitor already.

What is BERT-SQuAD?

BERT-SQuAD is a crossover between Google BERT and the Stanford Question Answering Dataset.

BERT is a cutting-edge Natural Language Processing algorithm that can be used for tasks like question answering (which we’ll go into here), sentiment analysis, spam filtering, document clustering, and more — it’s all language!

Understanding “BERT.” Dance icon by Freepik, bulb icon by Becris on Flaticon. Graphic by author.

“Bidirectionality” refers to the fact that many words change depending on their context— like “let’s hit the club” versus “an idea hit him” — so it’ll consider words on both sides of the keyword.

“Encoding” just means assigning numbers to characters, or turning an input like “let’s hit the club” into a machine-workable format.

“Representations” are the general understanding of words you get by looking at many of their encodings in a corpus of text.

“Transformers” are what you use to get from embeddings to representations — this is the most complex part.

As mentioned, BERT can be trained to work on basically any kind of language task, so SQuAD refers to the dataset we’re using to train it on a specific language task: Question answering.

SQuAD is a reading comprehension dataset, containing questions asked by crowdworkers on Wikipedia articles, where the answer to every question is a segment of text from the corresponding passage.

BERT-SQuAD, then, allows us to answer general questions by fishing out the answer from a body of text. It’s not cooking up answers from scratch, but rather, it understands the context of the text enough to find the specific area of an answer.

For example, here’s a context paragraph about lasso and ridge regression:

“You can quote ISLR’s authors Hastie, Tibshirani who asserted that, in presence of few variables with medium / large sized effect, use lasso regression.

In presence of many variables with small / medium sized effect, use ridge regression.
Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice.

Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.”

Now, we could ask BERT-SQuAD:

“When is Ridge regression favorable over Lasso regression?”

And it’ll answer:

“In presence of correlated variables”

While I show around 100 words of context here, you could input far more context into BERT-SQuAD, like whole documents, and quickly retrieve answers — an intelligent Ctrl-F, if you will.

To test the following 7 questions, I used Gradio, a library that lets developers make interfaces out of models. In this case, I used the BERT-SQuAD interface created out of Google Colab.

Screenshot of BERT-SQuAD Gradio interface being used, by author.

I used the contexts from this Kaggle thread as inputs, and modified the questions for simplicities sake.

Q1: What will happen if you don’t rotate PCA components?

the effect of PCA will diminish

Q2. How do you reduce the dimensions of data to reduce computation time?

we can separate the numerical and categorical variables and remove the correlated variables

Q3: Why is Naive Bayes “naive” ?

it assumes that all of the features in a data set are equally important and independent

Q4: Which algorithm should you use to tackle low bias and high variance?

bagging

Q5: How are kNN and kmeans clustering different?

kmeans is unsupervised in nature and kNN is supervised in nature

Q6: When is Ridge regression favorable over Lasso regression?

in presence of correlated variables

Q7: What is convex hull?

represents the outer boundaries of the two group of data points