STAT 448 Spring 2023 Syllabus

STAT 448 Advanced Data Analysis

4 Credit Hours - Major Elective

Sections 1UG/1GR

Spring 2023 - Syllabus

Table of Contents


COVID-19 General Information

Following University policy, all students are required to engage in appropriate behavior to protect the health and safety of the community. Students are also required to follow the campus COVID-19 protocols.

If you feel ill or are unable to come to class or complete class assignments due to issues related to COVID-19, including but not limited to testing positive yourself, feeling ill, caring for a family member with COVID-19, or having unexpected child-care obligations, you should contact your instructor immediately, and you are encouraged to copy your academic advisor.

Face Coverings

Face coverings are strongly recommended in classrooms during in-person class time, but not required.

University guidance is that face coverings can be recommended but not required, based on current local COVID-19 levels and CDC guidance. This is a very fluid situation and as has been our practice, we will adjust as conditions change. Wearing a face covering remains a personal choice. Please note that students and employees cannot be compelled to wear a face covering.

Please refer to the University of Illinois Urbana-Champaign’s COVID-19 website for further information on face coverings. Thank you for respecting all of our well-being so we can learn and interact together productively.

Course Description

Advanced Data Analysis (STAT 448) is a mature ad hoc course where data analysis is split into two specialties: interpretation and investigation. Mature ad hoc here means that students are expected to work independently and collaboratively when required, and students will have to do a large amount of reading, a medium amount of coding and discussing ideas, and a small amount of presenting formally. The Instructor will be a facilitator of discussion. This course is not a traditional lecture format. Most learning will happen primarily because of student’s own independent reading, collaborative discussions, and coding. Interpretation means that students will learn how to correctly interpret statistical results and visualizations, such as plots, models, and hypothesis tests. Investigation means that students will learn how to complete the Investigative Cycle which includes asking questions, making plans, finding and cleaning data, producing results, and making conclusions which yield more questions. Datasets will be introduced and students will conduct multiple investigative cycles for all datasets throughout the semester. This means that students must have a laptop that they bring to class each day. The expectation is that students will gain competency in critical and creative thinking, data exploration, accurate and informed analysis, and presenting analyses in various formats. Students will become proficient data analysts who understand how to explain statistical results as well as how to craft narratives with data. Subjectivity will be applied to assess students’ work. Feedback will be given to students and expected to be incorporated into new versions of their work. Students should be sufficient and comfortable in R prior to beginning this course. Students should be familiar with data management and wrangling techniques such as filtering, arranging, mutating, combining, and summarizing data, as well as manipulating strings. The RStudio offers reproducible documentation with Markdown syntax which will support long-term learning opportunities. Git propels students’ capacity for collaboration as well as version control of documentation and files. Any existing presentation software will be useful when students need to give presentations.


Learning Objectives

These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.

  • Students will assess effectiveness, organization, and intent from a published data set

  • Students will explore data sets of various types

  • Students must design well-organized, clean data sets for the purpose of data analysis

  • Students will present data management work in a reproducible document file using Markdown syntax and code chunks or cells. No local data files will be utilized.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set

  • Students must be able to explain and summarize data wrangling code

  • Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue

  • Students must recall important data management concepts

  • Students will reflect on their own learning of data management principles

  • Students will build data wrangling tools, apps, and dashboards and store all work using git, a version control software.

  • Students will collaborate on lab assignments.

  • Students will reproduce and replicate data visualizations.

  • Students will infer from data based on the questions and investigation they planned.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set.

  • Students must be able to explain and summarize coding, reproducible documentation, and textbooks/notes/readings.

  • Students will reproduce and replicate visualizations and coding.

  • Student groups will design their own investigations about real-world data. It will require posing problems, planning, working with data, analyzing, and concluding.

  • Students must recall important coding concepts and workflows.

  • Students will reflect on their own learning of data management principles.

  • Students will assess published visualizations for statistical correctness and visual clarity.


Course Staff

  • Instructor - Christopher Kinson (kinson2@illinois.edu)

  • Teaching Assistant (TA) for Section 1 - Zhe Chen (zhec6@illinois.edu)


Course Specifics

Course Website

The course website is https://github.com/illinois-stat448. This course is operating as an organization named illinois-stat448 within GitHub. Students should bookmark or save the link below in their browser for future use, because it contains access points to all repositories, course materials including notes, assignments, projects, and lecture videos.

Prerequisites

The prerequisites for this course are the following:

  • A laptop (not a netbook) with most up-to-date versions of R and RStudio installed. If using a netbook or Chromebook, please setup an RStudio Cloud account.

  • STAT 400 or STAT 409

  • Operating knowledge of computers such as locating a file, creating a directory, saving a file, compressing a file, extracting a compressed file, keyboarding, and fundamental troubleshooting

  • Operating knowledge of R such as understanding various objects, mathematical and logical operators, and value types and their coercion, as well as creating user-defined functions and fundamental R troubleshooting

  • Familiarity with hypothesis testing, modeling, machine learning, visualizing data, and data management/wrangling at an introductory level

Meeting Schedule and Expectations

  • For section 1UG/GR, there are in-person meetings at 11:00 am - 11:50 am in Room 2233 of the Everitt Laboratory building on Mondays, Wednesdays, and Fridays.

  • All students are expected to have familiarity with hypothesis testing, modeling, machine learning, visualizing data, and data management/wrangling at an introductory level.

  • All students are expected to fully participate in class regularly.

  • All students are expected to read and review textbooks, notes, articles, videos, and other content posted on the course website as needed. Course content will be found on the Course Website via the course_content repository. Do check the course_content repo often for updates and announcements posted as Issues in the Issues tab of the repo. Students are encouraged to clone and pull the course_content repository daily if accessing it remotely via git.

Office Hours

Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.

Office hours will be in-person. If a student has a specific question, but cannot attend the office hours, then that student should post their question in the Issues board. If a student wants one-on-one assistance from the course staff at an alternative time, then that student should email the course staff in order to schedule a Zoom meeting.

  • Instructor in-person office hours:

    • Wednesdays 1:00 pm - 2:00 pm in Room 166 Computer Applications Building (CAB)

Textbooks

There is no required textbook, but students may find the texts below to be helpful, especially as references. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library.

Presenting

  • *Presentation Skills: Educate, Inspire and Engage Your Audience. Weiss. http://www.library.illinois.edu.proxy2.library.illinois.edu/proxy/go.php?url=https://search-ebscohost-com.proxy2.library.illinois.edu/login.aspx?direct=true&db=nlebk&AN=1048764&site=eds-live&scope=site.

Creativity and Critical Thinking

  • *How to Be Creative: A Practical Guide for the Mathematical Sciences. Higham and Sherwood. https://epubs-siam-org.proxy2.library.illinois.edu/doi/book/10.1137/1.9781611977035
  • *An introduction to critical thinking and creativity : think more, think better. Lau. https://onlinelibrary-wiley-com.proxy2.library.illinois.edu/doi/book/10.1002/9781118033449

Data Wrangling

  • *Data Wrangling with R. Boehmke. Springer Cham. http://www.library.illinois.edu/proxy/go.php?url=http://dx.doi.org/10.1007/978-3-319-45599-0
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://r4ds.had.co.nz/
  • *Text Mining in Practice with R. Kwartler. https://onlinelibrary-wiley-com.proxy2.library.illinois.edu/doi/book/10.1002/9781119282105

Visualizing Data

  • *Storytelling with Data. Knaflic. https://www.oreilly.com/library/view/storytelling-with-data/9781119002253/?ar
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://r4ds.had.co.nz/

Analyzing Data

  • The Investigative Cycle. Wild and Pfannkuch. https://www.stat.auckland.ac.nz/~wild/StatThink/images/99.Investigative.png
  • The Data Analysis Cycle. Forster and Wild. https://www.stat.auckland.ac.nz/~wild/StatThink/images/10.DataAnal2.png
  • *Telling Data Stories: Essential Dialogues for Comparative Reasoning. Pfannkuch et al. https://www.tandfonline.com/doi/abs/10.1080/10691898.2010.11889479
  • *Exploratory Data Analysis with R. Pearson. https://www-taylorfrancis-com.proxy2.library.illinois.edu/books/9781315382111
  • R for Statistical Learning. Dalpiaz. https://daviddalpiaz.github.io/r4sl/
  • *An Introduction to Categorical Data Analysis. Agresti. https://onlinelibrary-wiley-com.proxy2.library.illinois.edu/doi/book/10.1002/0470114754
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://r4ds.had.co.nz/
  • Applied Statistics with R. Dalpiaz. https://book.stat420.org/

Markdown and RMarkdown

  • The Markdown Guide. Cone. https://www.markdownguide.org/getting-started
  • RMarkdown Cheat Sheet. RStudio. https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Git and GitHub

  • Happy Git and GitHub for the useR. Bryan et al. https://happygitwithr.com/

Software

The course requires students to already have a fundamental and operational understanding of R. It is recommended that students with no familiarity in R understand that this course will not discuss fundamental and operational usage of R.

Calendar

Below is a calendar of topics and tentative assignment deadlines.

Week 2023 Dates Topics Covered
01 01/16 - 01/20 Introduce the course and software, Data repositories, Interpretations, Investigations
02 01/23 - 01/27 Presenting, Data preparation/wrangling, Creative thinking habits
03 01/30 - 02/03 Analysis Presentations Series 01
04 02/06 - 02/10 Analysis Presentations Series 01
05 02/13 - 02/17 Analysis Presentations Series 01
06 02/20 - 02/24 Analysis Presentations Series 01
07 02/27 - 03/03 Analysis Presentations Series 01
08 03/06 - 03/10 Analysis Presentations Series 01
09 03/13 - 03/17 Spring Break
10 03/20 - 03/24 Analysis Presentations Series 02
11 03/27 - 03/31 Analysis Presentations Series 02
12 04/03 - 04/07 Analysis Presentations Series 02
13 04/10 - 04/14 Analysis Presentations Series 02
14 04/17 - 04/21 Analysis Presentations Series 02
15 04/24 - 04/28 Analysis Presentations Series 02
16 05/01 - 05/05 Reading Day - May 04 (no class and no office hours), Discussion about careers, graduate school, and the future post-STAT448, Final Reflection Essay
17 05/08 - 05/12 Final Reflection Essay for section 01 due at 11:59 pm Thursday May 11

Grading Breakdown

15 Written Reflections: 15 points total (1 point each)

  • RR01-16 are 1-page weekly reflective essay written in Markdown and rendered to .pdf due on Mondays. No images are allowed in the file. No one receives credit for RR01. These are completion assignments.

2 Analysis Presentations: 40 points total (20 points each)

  • AP01 series begins Week 03 (February 01) and ends Week 08 (March 10). Each student will present their data analysis (roughly 8 minutes) in-person in-class only once in this series. See rubric on Course Website.
  • AP02 series begins Week 10 (March 22) and ends Week 15 (April 28). Each student will present their data analysis (roughly 8 minutes) in-person in-class only once in this series. See rubric on Course Website.

1 Final Reflection Essay: 20 points total

  • Maximum 3-page essay written in Markdown and rendered to .pdf. Line and text spacing should be standard for Markdown. No images are allowed in the file. See rubric on Course Website.

Course Total Points: 75 points

Potential for Bonus Points: For each week (Weeks 03-08 and Weeks 10-15), 10 students will be chosen at random to provide written feedback on the quality of the presentations. This feedback will be posted in the Issues tab of each data repository. Each student has two opportunities to write the feedback. Thus the number of bonus points maximizes at 2 points. Read more about data repositories in the Instructional Activities section below.

Final Letter Grades

When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There is only one $+$ letter grade in this course. All other letter grades are without $+/-$. Points are not rounded.

Lower bound Upper bound Letter grade
75.000 points 77.000 points A+
67.500 points 74.999 points A
60.000 points 67.499 points B
52.500 points 59.999 points C
45.00 points 52.499 points D
0.00 points 44.999 points F

Instructional Activities

Students should read and review the course content including reference textbooks, articles, notes, and videos. If or when students get stuck, then they should ask questions in the i) in person during class, ii) Issues Board, or iii) Office Hours. The following activities and tools will be useful for students.

Discussion

Students will spend lots of time talking about the data, their analyses, and ideas during class. The Instructor may facilitate specific discussion topics, but the goal is to have space to ask questions, hear thoughts, and share to keep the relevant discussion going.

Issues Tab

This discussion board, which exists as a tab on the course_content repository, is one of the best ways to communicate with classmates and course staff. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.

Do use the board to openly discuss ideas about the course such as questions about content, deadlines, notes, data, etc. If a student specifically wants the course staff to respond, then student should use the mention @staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to kinson2@illinois.edu. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.

The course staff will view and respond to content on Tuesdays and Thursdays (at a minimum). The course staff can be expected to spend at least 30 minutes on Tuesdays and Thursdays monitoring the Issues.

Data Repositories

The data in this course will be hyper-local focusing on Champaign, Urbana, University of Illinois, Champaign County, and the state of Illinois. Each dataset will have its own repository in GitHub such that all students have access to them. Students will conduct analyses on the data, submit their analysis including data preparation and other coding files (in analysis directory) and presentation files (in presentations directory).

Written Feedback

Because subjectivity is the foundation of grading, written feedback will be given along with scores based on rubrics. If you do not like the written feedback, please consider what is written as constructive to improve your growth. Practice patience with yourself as you attempt to not take things personally.

Assignments

Written Reflections

These are 1-page weekly reflective essays written in Markdown and rendered to .pdf due on Mondays. No images are allowed in the file. No one receives credit for RR01. The reflection is about the student’s own growth and development in this course with specific details and structure exhibiting a command of the English language and grammar. These assignments are to be submitted in the student’s repository, which exists within the illinois-stat448 organization in GitHub. This repo is named as sp23_stat448_netid. See Course Website for web links.

Analysis Presentations

These are formal presentations of your data analysis during class in-person during Weeks 03-08 and Weeks 10-15. Students have roughly 8 minutes to present (as individuals not in groups) with an additional maximum of 2 minutes for questions. Due to this time constraint, the Instructor will always ask the first question post-presentation. Any presentation software such as Google Slides, Microsoft PowerPoint, Node.js, Beamer, Quarto, etc is satisfactory. Any technical troubles with presentation software are the student’s responsibility. The course is located in an iFLEX classroom, which allows for ease of communication, collaboration, and displaying content on large screens. All students must bring their laptops to class each day. Each student must complete two analysis presentations by the respective presentation dates. Student analysis files and presentation files must be submitted by the student in the specific dataset’s repository in GitHub (in analysis directory and presentation directory, respectively). There will be no make-ups for any missed presentations. This policy applies to any students who add the course late to their registration.

Final Reflective Essay

The final reflective essay is a maximum 3-page essay written in Markdown and rendered to .pdf and submitted to student’s repository in GitHub. This repo is named as sp23_stat448_netid. Line and text spacing should be standard for Markdown. No images are allowed in the file. The reflection is about the student’s own growth and development in this course with specific details and structure exhibiting a command of the English language and grammar. This essay is due at 11:59 pm Thursday May 11, 2023. Late submissions on any Final Reflective Essay will not be accepted or graded. There will be no make-ups for any missed Final Reflective Essay.


University Specifics

Disability Accommodations

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, student may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website at http://disability.illinois.edu/

Academic Integrity

It is expected that all students abide by the campus regulations on academic integrity http://studentcode.illinois.edu/article1_part4_1-401.html. Intentional violations of academic integrity can be found at http://studentcode.illinois.edu/article1_part4_1-402.html and include, but are not limited to, copying any part of another student’s assignment and allowing another student to copy any part of student’s own assignment.

Safety Protocol

We have been asked by Public Safety https://police.illinois.edu/emergency-preparedness/run-hide-fight/ to share the following information in case of weather or security emergencies. See the links:

Sexual Misconduct Policy and Reporting

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found at https://wecare.illinois.edu/resources/students/#confidential. Other information about resources and reporting is available at https://wecare.illinois.edu.


The Last Word

The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.