banner

Reading Apprenticeship Inspired Assignment or Lesson

Lesson on Exploratory Data Analysis
Introductory course in Data Science 
Shilpa Gupta, Ph.D. 


Purpose

The objective of the activity is to make students aware of the thinking process involved in doing exploratory data analysis in addition to the knowledge of how to perform the analysis.

Context

It is a 200 level course for incoming graduate students in MS AI, MSISE and MSEM program. The course introduces them to the end-to-end data analysis process through hands-on analysis using R statistical programming language. This task takes place in the second lecture (after introduction to the course). Prior exposure to statistics such as mean, standard deviation, skewness, correlation is useful but not necessary.

Criteria

The students would have to mimic the process on a dataset of their choice from the inbuilt datasets in R as homework.

Metacognitive Conversations

This activity will make the students aware of the metacognitive conversation they need to have in order to build an understanding of the data before performing and sharing data analysis. This task is very unstructured and attempts to mimic real world problem solving.

Text and Materials

I would share the Motor Trend Car Road Test Dataset (R code: datasets::mtcars) which comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles in class for students to work through the example individually and in groups.

  

Details

The instructor will share a dataset with the class. The students would take a few minutes (1-5 mins) to just look at the data. Following that students would be split into smaller groups and asked to discuss for 10-15mins the following questions and record it in a collaborative document for instance a google document: 

1. What are they curious about knowing from this dataset? Why is that of interest to you?

2. What one decision would they want to make from the data? Why ?

3. Why are they interested in making that decision? (to make them aware that multiple stakeholders can have different reasons for analyzing data, for instance climate data - journalist vs reader vs oil company executive…)


Instructor would then share common data summarizing and visualizing techniques for 20 mins perhaps on different data snippets. Follow that with a text with a complete worked out example. The students will get a chance to read individually, critique and discuss the analysis in a breakout room, reflect on insights as the whole class. Students would then go back to their groups and discuss what data summary and visualization technique help them in answering their initial question and perform the method. This part of the activity would be around 30 mins. Once back, the activity will close by groups sharing their findings about the content and the approach and discuss as a broader group their reasons for choosing a particular visualization technique. Wrap up the class with a reading this text and discussing it in small groups. 


This class activity will be followed by students performing EDA on their own individual datasets (different from the one shared in class) as homework.