The Data Wrangling seminar was created as part of the NSF-funded Data Science for All project with the goal of producing seminars that students from across a spectrum of majors (upper and lower division) could take to get introduced to data science. We have also presented this multiple times at the community college level.In this seminar, we introduce students to data science using Python. Students load Python programs in jupyter notebooks in Google Colaboratory (requires a gmail account).
During the seminar, students:
- List different sources of data and data classifications
- Describe the data science system and data wrangling
- Interpret, modify and create basic Python programs to wrangle data using pandas.
The initial programs include fictitious data to explore the use of pandas for data wrangling. To experience data wrangling with real data, the seminar includes a program to wrangle data from Spotify.
Although this technology is new to most of the students, we use an interactive approach that's hands-on so we don't lose them. The materials are designed as a stand-alone 3-hour seminar, but the materials could be broken up and used as a module in a course.The link provided here is to our project's website at San Jose State University (where the faculty involved in the project teach in MIS and AIS).
The above link will take you to the webpage for this particular seminar, but currently there are 8 seminars created and all provide materials under the same Creative Commons License. The webpage provides links to some of the material including a PDF of the slides, the datasets created for the seminar, the notebook used, and some additional materials.
We also make available additional teaching materials, all of the materials bundled in a Canvas cartridge, the PowerPoint slides in case you want to edit them, and a notebook and test materials we use for creating digital badges for participants. The additional materials and test questions will be provided to anyone with a faculty email address and webpage (it can even be a page that just lists you with your email as the instructor at an actual university or college). The instructions for requesting the additional materials will always be at this address on the Data Science for All website.
All of these materials are available to any faculty member to use or modify under the CC license, we just don't put them directly on the website in case anyone is using the test questions (yes, we know, they will end up on the web, but we try our best not to disclose the materials to students in case any faculty are using the questions on a test or quiz, and ask you to also).
Development of the Data Science for All Seminar Series is funded under NSF grant #1829622