Staying in the Loop with Python, the Queen of Data Science

My on-and-off relationship with Python began a few months before I started my Master’s degree. When I knew that I was going to turn towards IT, it was a no brainer that I had to raise my coding game. I had learned C programming during my engineering days but that was almost a decade ago. So, to go back to my roots, I took a weekend course in object-oriented programming with Java. While it was a lot of fun, it became clear to me that Java, though brilliant, was more of a mobile app development tool (no offense, Java lovers!). There was another language that reigned over the data science kingdom and for any chance of success as a data analyst, I had to woo her.

I started learning Python with the MIT OCW course (edX) on Introduction to computational programming with Python to understand the basic data structures and some beginner-level programs. While I got through the basics, I could not complete this course as, after a point, I found it to be a bit dry. And that was that. At UTD, I was already making good progress in my analytics learning trajectory thanks to my work with R programming. So there was no need to hurry things up with another language. However, as things progressed with my club Travelytics, and I came across competitions online, I couldn’t delay getting my hands dirty with Python anymore.

So, I dived right in with Kaggle Learn‘s wonderful data science track which started with 7 hours of Python, including all the basics from variables, lists, loops and functions to important libraries and elementary programs. This was followed by my internship at iCode where I worked with Python projects and also trained over 50 students in the foundations of Python and machine learning. The hands-on exercises and projects at iCode, like building a movie recommender system, were of great help in laying down the foundations of Python for data science in my brain.

Joseph Kim in one of his Python sessions at UTD

Back at UTD, it helped that my friend, Joseph Kim, who was the President of the data science club, conducted some amazing hands-on sessions for people to learn Python basics. Attending these sessions helped me, and many others, stay in the loop (pun totally intended). Then came my own Python research for my facial recognition project to solve crime tourism, at the end of which I had adapted three simple python programs that detect and recognize faces in real time. This was my most memorable time spent with Python programming, as I was able to see some tangible results generated by code written by me.

In the last few months, I have been following the extraordinary free YouTube lessons of Krishna Naik. His Machine Learning playlist is the most valuable resource I have found online that helps me practise everything from the use of impressive data science libraries like NumPy, Pandas and scikit learn to data visualization exercises with matplotlib and Seaborn. He is also an excellent coach in analytics concepts like entropy and Gini impurity, and machine learning algorithms like regression, k-means clustering, k-nearest neighbors, decision trees and ensemble methods.

We are truly fortunate to live in a world and time where so many resources are available for anyone who has an Internet connection and wants to learn. I am currently working my way through Kiril Eremenko’s well-acclaimed Udemy course on Python for data science. While all these wonderful online resources have their charm, nothing comes close to in-class training. This became evident in my object-oriented programming class with Dr. Nassim Sohaee. Her diligent classwork and challenging assignments, which I am still working on, have been excellent tools to help me understand the nuts and bolts of object-oriented design and the anatomy of Python programming. I have worked in various projects dealing with loops, functions, classes, inheritance and exception handling. In addition to all the data science exercises, this class has helped me gain more confidence in leveraging Python as a powerful programming language in the time to come.

ALSO SEE Saying “Hello, old friend” to Statistics and Analytics
Diving Deep into Business Analytics with R Programming

This is the fifth post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.

Diving Deep into Business Analytics with R Programming

When a class is named after your graduation major, and one of the most popular disciplines in the present world, you know it’s going to be pivotal in your learning path. BA with R proved to be just that. The brilliant Dr. Sourav Chatterjee made it clear right at the beginning that R programming is going to be used just as a tool (which it is) to understand and master the nuances of business analytics. Having said that, his course material left no stone unturned in taking us through all aspects of R programming needed for data science.

I had worked a bit with Java and PHP, but this was my first experience with the R programming language. I started with an introductory course on Datacamp to quickly learn the very basics of R like vectors, matrices and data frames. Then, in class, Dr. Chatterjee proved to be a dedicated and patient professor as he started with basic manipulations and sample generation in R and then quickly moving to the foundations of data analytics. We got familiar with libraries like tidyverse, forecast, gplots and toyed with data visualization using ggplot on some interesting data sets. We created several plots, graphs, charts, and heatmaps, before scaling up to larger data sets.

This was followed by some of the most important things a business analyst/data scientist learns in his career. So far, everything looked pretty straight forward to me but now was the time to push boundaries and actually dive deep into analytics. I was introduced to dimension reduction, correlation matrix and the all-important analytics task of principal component analysis (PCA). I learnt how to evaluate performance of models, create lift and decile charts, and classification with the help of a confusion matrix – all with just a few lines of code. As Dr. Chatterjee explained time and again, it was never about the code. It was about knowing when and how to use it and what to do with the result.

Dr. Sourav Chatterjee’s BA with R class

We then followed the natural analytics progression with linear and multiple regression where I learned about partitioning of data and generating predictions. This was followed by a thorough understanding of the KNN model and how and when to run it. By now, I was beginning to get a hand of problem statements and the approach to take to solve them, thanks to class assignments on real-world scenarios like employee performance and spam detection. Through the examples done in class, it was easy to grasp the concepts of R-squared value, p-value and the roles they play in model evaluation. It was in this class that I understood logistic regression, discriminant analysis, association rules for the first time and I have been working on them ever since, in every data science course or project that I have taken up.

All of this knowledge and Dr. Chatterjee’s guidelines were put to use in the final project where I worked with a group led by the talented Abhishek Pandey on London cabs data. After rigorous work on large data sets downloaded/extracted from various sources, we trained a model to predict arrival times for cabs by comparing RMSE across random forests, logistic regression, and SVMs. It was a great way to put into practice everything we had learned over four months.

And with that, I had laid a robust foundation in data analytics, and was ready to build it further in the time to come. By January 2019, I was confident to dive into analytics projects and work on complex data sets to generate prediction models using the tools taught by Dr. Saurav Chatterjee.

ALSO SEE Saying “Hello, old friend” to Statistics and Analytics

This is the second post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.

Saying “Hello, old friend” to Statistics and Analytics

There’s a reason I chose Statistics to be no. 10 and the first one in this countdown. When you want to enter the world of data science, you realize very quickly that you can do nothing without the concepts of statistics being clear in your head. The University of Texas at Dallas obviously understood this and made Statistics and Analytics a core course. So, when I started my Master’s program in Fall 2018, I enrolled for this course with Dr. Avanti Sethi in my very first semester. Dr. Sethi proved to be an excellent teacher, and I am honored to have had the pleasure of knowing and working with him during the past two years.

Photo by Luke Chesser on Unsplash

Thanks to his well-designed lectures and assignments, I was able to build a strong statistical foundation with good practice of basic concepts like measures of central tendency (mean, median, mode) and measures of statistical dispersion (variance, standard deviation, IQR). The course then went on to cover concepts like population, sampling, estimation, z-score, t-score, Normal distribution, hypothesis testing, p-value, chi-square tests, ANOVA tests and regression. Dr. Sethi, who is an Excel ninja, also conducted a separate hands-on session for students interested in learning Advanced Excel and taught us how to build macros. The problem statements in his assignments covered real-life scenarios ranging from sports team performances and automobile dealerships to Halloween sales and manufacturing plant obstacles.

Dr. Sethi’s class in Sep 2018

And just like that, right in the very first semester, Statistics and Analytics had set the ball rolling on my data science journey. I have been going back to Dr. Sethi’s assignments every few months, to make sure I don’t forget the very foundations of everything that I have learned in analytics so far. It was a memorable semester thanks to this wonderful class, and left me with a lot of confidence to move forward.

This is the first post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.

Travelytics presents BIG DATA IN TRAVEL with Dr. Rick Seaney

When we kicked off our first Travelytics event in 2018, Prof. Kevin Short at UTD was kind enough to grace us with his presence and speak on the use of data in the airline industry. And now, thanks to him, we have a travel domain stalwart visiting UTD and conducting a special lecture for Travelytics. The topic is an exciting one – BIG DATA in the TRAVEL INDUSTRY. We look forward to an exciting session with Dr. Seaney and a bunch of enthusiastic data science students.

Dr. Rick Seaney - Big Data in Travel

My First Job in The United States

So, amidst all the chaos of studies, cultural events, theatre, travel and Netflix, I landed an internship in summer 2019 at an impressive company iCode. As it happens, this is my first job in The United States and I am having quite an enriching experience. I have tried spell it out in this LinkedIn post and article. Check it out: