Skip to main content

Posts

Featured Post

Building a News Classifier from Scratch with a pytorch based model

Building a News Classifier from Scratch with a Custom Transformer Model 🧠 Ever wondered how news apps categorize articles so accurately? It's often done using Transformers , a powerful neural network architecture that forms the backbone of modern language understanding. In this post, we'll build a news category classifier from the ground up, using our own custom Transformer. We'll explore the key components, prepare a real-world dataset, and train our model to classify news articles into one of 42 categories. 1. The Dataset: News Category Dataset Our journey starts with the News Category Dataset from Kaggle, a large collection of news headlines and short descriptions. The first step is to prepare this text for our model. We combine the headline and short_description columns into a single full_text column. We then create a numerical mapping for each unique news category. Python # Combine headline and short_description df[ 'full_text' ] = df[ 'headline...
Recent posts

Understanding LDA and QDA: A Comparative Guide

  Certainly! Here's a comprehensive blog post that delves into the concepts of Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), highlighting their differences, applications, and considerations for use. Introduction In the realm of statistical classification, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two foundational techniques. Both are grounded in probabilistic models and are particularly effective when the data adheres to certain assumptions. While they share similarities, their differences in assumptions and flexibility make them suitable for different scenarios. Linear Discriminant Analysis (LDA) LDA is a classification method that projects high-dimensional data onto a lower-dimensional space, aiming to maximize class separability. It operates under the assumption that: Each class follows a Gaussian (normal) distribution. All classes share the same covariance matrix. These assumptions lead to...

20 Must-Know Math Puzzles for Data Science Interviews: Test Your Problem-Solving Skills

Introduction:   When preparing for a data science interview, brushing up on your coding and statistical knowledge is crucial—but math puzzles also play a significant role. Many interviewers use puzzles to assess how candidates approach complex problems, test their logical reasoning, and gauge their problem-solving efficiency. These puzzles are often designed to test not only your knowledge of math but also your ability to think critically and creatively. Here, we've compiled 20 challenging yet exciting math puzzles to help you prepare for data science interviews. We’ll walk you through each puzzle, followed by an explanation of the solution. 1. The Missing Dollar Puzzle Puzzle: Three friends check into a hotel room that costs $30. They each contribute $10. Later, the hotel realizes there was an error and the room actually costs $25. The hotel gives $5 back to the bellboy to return to the friends, but the bellboy, being dishonest, pockets $2 and gives $1 back to each friend. No...

A look in probability for data science

To have a solid foundation in probability theory for data science, let's explore key concepts in a structured manner. We’ll start from the basics and gradually move to more advanced ideas. This overview will give you the necessary theoretical background to understand how probability is applied in data science, particularly in machine learning, statistical modeling, and predictive analytics. 1. Random Variables A random variable is a variable that takes on different values based on the outcomes of a random phenomenon. Random variables are of two main types: Discrete Random Variables : These take on a countable number of values. For example, the outcome of a die roll (1 through 6) is a discrete random variable. Continuous Random Variables : These take on an uncountable number of values, typically within some interval. For example, the time it takes for a customer to make a purchase in an online store can be modeled as a continuous random variable. 2. Probability Distribution T...

Interview Dialogue: Customer Churn Prediction Case Study

Introduction: Case studies are a fundamental part of data science interviews, offering candidates a platform to showcase their problem-solving abilities, technical expertise, and business acumen. They provide a glimpse into how real-world data science problems are approached, dissected, and solved. This dialogue between an interviewer and interviewee takes you through a detailed exploration of a customer churn prediction case study, giving valuable insight into how such problems are tackled during an interview setting. This discussion not only highlights the steps involved in solving a data science case study—from understanding the problem, data exploration, and feature engineering, to model selection and deployment—but also demonstrates how candidates can effectively communicate their thought process and technical decisions. If you're preparing for a data science interview, this dialogue offers a blueprint for how a typical case study interview unfolds and the type of reasoni...