Data Scientists Archive

Coding principles for rapid machine learning experimentation

According to François Chollet, the creator of Keras and Someone who at his best, placed 17th on Kaggle’s cumulative leaderboard: it is not about having the best idea(s) from the start but rather iterating over ideas often

What to expect after completing a Data Science Certification Program?

1 in 3 ads we see now, sell ‘Certification Courses’ that would transform careers. There is hardly any talk about what to expect once we complete an online certification program/course, especially to change one’s line of profession.

How can I solve any question on finding the probability of an event in a job interview / written test? (Part 2)

Properties of probability i) 0 ≤ P(A) ≤ 1. ii) P(ɸ) = 0, P(S) = 1. iii) For any two events A and B, P(AUB) = P(A) + P(B) – P(A∩B), where AUB is read as ‘A

Web Scraping using Selenium

What is Web Scraping? Web Scraping is a popular methodology to extract data from websites. This is often done to derive insights for Sentiment Analysis, Predicting User preferences, Cross-Selling products, etc. Some of the real-life examples of

Outlier Detection in High Dimensional Data using ABOD

We know what outliers are – the data points which lie outside of where most of our data lies. When dealing with Outliers, it is relatively straightforward to find outliers in a uni-dimensional setting where we could

All the Numbers Behind the Decision Tree

In this article, we are going to see various numbers which we get as output after we implement a decision tree. Prerequisite Before you can start reading this blog we expect you to have these prior understanding

Outlier Detection in Time Series

Anomalies a.k.a Outliers are patterns in data which do not conform to the data points near to it.  Statistically, they are the data points that are deemed to belong to a different population. Timely detection of anomaly

Multi-Label Classification

Classification has been a go-to approach for various problems for many years now. However, with problems becoming more and more specific a simple classification model can’t be the solution for all of them. Rather than having one

How to use ELMo Embedding in Bidirectional LSTM model architecture?

Embeddings from Language Models (ELMo) ELMo embedding was developed by Allen Institute for AI, The paper “Deep contextualized word representations” was released in 2018. It is a state-of-the-art technique in the field of Text (NLP). It is

Handling Class Imbalance – An Ensemble – Majority Voting on Minority Samples

Class Imbalance is a major problem in machine learning models where the total number of one class data is far less than the total number of another class. This problem is extremely common in practice and can