In this short post, I will walk you through implementing the K-means clustering algorithm from scratch in the most efficient way.

The post assumes that you already know the theory of the algorithm.

Let’s start by creating some short dataset to use:

import numpy as np

import random

import matplotlib.pyplot as plt

import pandas as pdgroup1 = [(random.randint(0, 100), random.randint(0, 100)) for i in range(50)]

group2 = [(random.randint(150, 250), random.randint(250, 350)) for i in range(50)]

I created two distinct groups so it’s clear for us to see the clusters. Let’s plot them and see:

`x_data = np.concatenate([np.array(group1)[:, 0], np.array(group2)[…`

In this five-part group project, we explain how we created a recommendation system for the website careervillage.org using data analysis and machine learning.

**Introduction**

CareerVillage is a website that is similar to Quora and Stackoverflow. They allow students to post questions about anything related to their future careers. Volunteers with expertise in various fields then answer their questions. Currently, the website lets professionals follow specific topics (programming, finance, engineering, Python, sports…etc). Then, when students post questions, they specify the topic related to their question and once the question is posted, an email notification is sent to the professional who is…

This post is part 3 of the series about building a recommendation system for CareerVillage.com. You can find the outline of the series here.

**Quick Summary**

CareerVillage is a website that is similar to Quora and Stackoverflow. They allow students to post questions about anything related to their future careers. Volunteers with expertise in various fields then answer their questions. Currently, the website lets professionals follow specific topics (programming, finance, engineering, Python, sports…etc). Then, when students post questions, they specify the topic related to their question and once the question is posted, an email notification is sent to the professional…

**Note**: This is part 2 of two parts on analyzing and understanding the Titanic dataset. Please find part 1 here.

In the past post, I conducted statistical analysis on the Titanic dataset to answer the question of whether the socioeconomic class of the passengers had an effect on their probability of survival. The statistical significance test showed that the “Pclass” variable, which is the class of each individual that I used as an indication of the socioeconomic status, had a significant effect on people’s survival with a p-value of 2.24e-147. You can read the post here.

In this post, I’m…

**Note**: This is post 1 of two posts on analyzing and understanding the Titanic dataset. Please find part 2 here.

Hypothesis testing is a very common concept in statistical inference. In order to make a conclusion or inference using a dataset, hypothesis testing has to be conducted in order to assess the significance of that conclusion. In this post, I’m going to use the Titanic dataset to provide an intuitive explanation of each step involved in the significance testing.

We all watched the Titanic movie and we saw the part when rich people started to give money to get priority…

Modeling fluid flow has been an area of interest for many researchers and scientists in the field of engineering and applied physics. In the past decades, many techniques have been developed to model the kinematics or the dynamics of the fluid. The common approach is to use the famous Navier-Stokes equations, which fully describe the behavior of fluids. These equations would then be solved using numerical techniques like Finite Differences.

Although these methods provide a very good approximation to the system, they have a high computational cost and they are generally hard to work with. In 1986, Uriel Frisch et…

In this post, I’m going to show you how to build a simple GUI App using Python’s Tkinter. By the end of this short tutorial, you should be able to create a simple Python App with buttons and data entries and use them to do simple functions.

Before you start reading, you will need to install Tkinter and open a new IDLE on your local device. This tutorial is based on Python 3.6.

We first start by importing Tkinter and to make it easy, we give it a shorter name “tk”:

`import tkinter as tk`

The first step in creating…

The Ising model is another example of a fascinating discrete model that can do a great job at modeling and simulating continuous systems. Among the many uses of the Ising model, it’s mainly known for simulating and studying ferromagnetic phase transitions which are notable changes in the properties of specific materials.

The most common shape of the model is a simple 2D grid that is divided into LxL cells. Each of these cells can occupy a binary number (+1 or -1). In the statistical mechanics' lingo, these correspond to magnetic dipole moments of atomic “spins” that can be either spin…

BERT is a very interesting multilayer deep learning model that is currently considered as the state of the art for natural language processing. It has been pre-trained on Wikipedia and BooksCorpus so it can do a good job at many natural language processing tasks. The special thing about this model is that it manages to provide a very rich representation of the words that manage to capture the context that the word was used in. This feature is really important in the NLP applications where the context can change what a word refers to(e.g. …

Data Science and Modeling and Simulation Enthusiast| Computational Sciences and Physics Student | gaber@minerva.kgi.edu