STAT 440 Spring 2023 Syllabus

STAT 440 Statistical Data Management

3/4 Credit Hours - Major Elective

Sections 1UG/1GR

Spring 2023 - Syllabus

Table of Contents


COVID-19 General Information

Following University policy, all students are required to engage in appropriate behavior to protect the health and safety of the community. Students are also required to follow the campus COVID-19 protocols.

If you feel ill or are unable to come to class or complete class assignments due to issues related to COVID-19, including but not limited to testing positive yourself, feeling ill, caring for a family member with COVID-19, or having unexpected child-care obligations, you should contact your instructor immediately, and you are encouraged to copy your academic advisor.

Face Coverings

Face coverings are strongly recommended in classrooms during in-person class time, but not required.

University guidance is that face coverings can be recommended but not required, based on current local COVID-19 levels and CDC guidance. This is a very fluid situation and as has been our practice, we will adjust as conditions change. Wearing a face covering remains a personal choice. Please note that students and employees cannot be compelled to wear a face covering.

Please refer to the University of Illinois Urbana-Champaign’s COVID-19 website for further information on face coverings. Thank you for respecting all of our well-being so we can learn and interact together productively.

Course Description

Statistical Data Management (STAT 440) is a focused data wrangling course that aims to cover various types of data storage, manipulation, cleaning, and extractions and to apply these methodologies in R. This means that students must have a laptop that they bring to class each day. This course is not a traditional lecture format. The expectation is that students will gain competency in exploring, organizing, designing, creating, storing, cleaning, wrangling, sharing, and using data, all of which are commonly done prior to data analysis. Critical and creative thinking and efficient coding will be encouraged. Concepts covered in this course will build upon each other. Thus, students can expect all assessments to be cumulative. Students should be sufficient and comfortable in R prior to beginning this course. The RStudio offers reproducible documentation with Markdown syntax which will support long-term learning opportunities. Git propels students’ capacity for collaboration as well as version control of documentation and files.


Learning Objectives

These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.

  • Students will assess effectiveness, organization, and intent from a published data set

  • Students will explore data sets of various types

  • Students must design well-organized, clean data sets for the purpose of data analysis

  • Students will present data management work in a reproducible document file using Markdown syntax and code chunks or cells. No local data files will be utilized.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set

  • Students must be able to explain and summarize data wrangling code

  • Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue

  • Students must recall important data management concepts

  • Students will reflect on their own learning of data management principles

  • Students will build data wrangling tools, apps, and dashboards and store all work using git, a version control software.

  • Students will collaborate on lab assignments.

  • Students will reproduce and replicate data visualizations.


Course Staff

  • Instructor - Christopher Kinson (kinson2@illinois.edu)

  • Teaching Assistant (TA) for Section 1 - Jim (Yici) Yan (yiciyan2@illinois.edu)

  • Course Assistant (CA) for Section 1 - Wenqi Zeng (wenqiz4@illinois.edu)

Course Specifics

Course Website

The course website is https://github.com/illinois-stat440. This course is operating as an organization named illinois-stat440 within GitHub. Students should bookmark or save the link below in their browser for future use, because it contains access points to all repositories, course materials including notes, assignments, projects, and lecture videos.

Prerequisites

The prerequisites for this course are the following:

  • A laptop (not a netbook) with most up-to-date versions of R and RStudio installed. If using a netbook or Chromebook, please setup an RStudio Cloud account.

  • STAT 400 or STAT 409

  • Operating knowledge of computers such as locating a file, creating a directory, saving a file, compressing a file, extracting a compressed file, keyboarding, and fundamental troubleshooting

  • Operating knowledge of R such as understanding various objects, mathematical and logical operators, and value types and their coercion, as well as creating user-defined functions and fundamental R troubleshooting

Meeting Schedule and Expectations

  • For section 1UG/GR, there are in-person meetings at 4:00 pm - 4:50 pm in Room 32 of the Psychology building on Mondays, Wednesdays, and Fridays.

  • There will be asynchronous lecture videos with links posted on the Course Website (in videos directory of the course_content repository). See the Instructional Activities section below for more details.

  • All students are expected to fully participate in class regularly.

  • All students are expected to do the following before coming to class each week: read the course notes, watch the lecture videos, and complete piloted practice assignments.

  • Course content - syllabus, notes, videos, exams, projects, and weekly schedules, discussion board (as Issues tab) - will be found on the Course Website via the course_content repository. Do check the course_content repo often for updates and announcements. Students are encouraged to clone and pull the course_content repository daily if accessing it remotely via git.

Office Hours

Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.

Office hours will be in-person. If a student has a specific question, but cannot attend the office hours, then that student should post their question in the Issues board. If a student wants one-on-one assistance from the course staff at an alternative time, then that student should email the course staff in order to schedule a Zoom meeting.

  • Instructor in-person office hours:

    • Wednesdays 2:00 pm - 3:00 pm in Room 166 Computer Applications Building (CAB)

Textbooks

There is no required textbook, but students may find the texts below to be helpful. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library as E-books.

R

  • *Data Wrangling with R. Boehmke. Springer Cham. http://www.library.illinois.edu/proxy/go.php?url=http://dx.doi.org/10.1007/978-3-319-45599-0
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://learning.oreilly.com/library/view/-/9781491910382/?ar
  • Mastering Shiny. Wickham. https://mastering-shiny.org/

Markdown

  • The Markdown Guide. Cone. https://www.markdownguide.org/getting-started

RMarkdown

  • RMarkdown Cheat Sheet. RStudio. https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Git and GitHub

  • Happy Git and GitHub for the useR. Bryan et al. https://happygitwithr.com/

Software

The course requires students to already have a fundamental and operational understanding of R. It is recommended that students with no familiarity in R understand that this course will not discuss fundamental and operational usage of R.

Calendar

Below is a calendar of topics and tentative assignment deadlines.

Week 2023 Dates Topics Covered
01 01/16 - 01/20 Introduce the course and software - Markdown, R/RStudio/RMarkdown and Git/GitHub, Git/GitHub tips
02 01/23 - 01/27 Loops and conditional execution, Apply family of functions, Vectorization
03 01/30 - 02/03 What is data, Structures of data, delimiters, and file extensions, Accessing and importing data, Exporting data, Handling dates and times
04 02/06 - 02/10 Assigning objects, Accessing and importing data via web scraping, Accessing and importing data via an API
05 02/13 - 02/17 Arranging data, Reshaping data, Data expansion, Data reduction
06 02/20 - 02/24 Regular expression and string manipulation, Summarizing data, Combining data
07 02/27 - 03/03 Midterm Exam 01 on Wednesday March 01, Focus on Midterm Exam
08 03/06 - 03/10 Validating data, Cleaning data
09 03/13 - 03/17 Spring Break
10 03/20 - 03/24 SQL queries and sub-queries
11 03/27 - 03/31 Data visualization using base R
12 04/03 - 04/07 Data visualization using tidyverse
13 04/10 - 04/14 Midterm Exam 02 on Wednesday April 12, Focus on Midterm Exam
14 04/17 - 04/21 Shiny apps and dashboards
15 04/24 - 04/28 Final Project Pre-Feedback Submission due at 11:59 pm Friday April 28, Final Project Help and Questions and Answers
16 05/01 - 05/05 Reading Day - May 04 (no class and no office hours), Data workers and responsibilities, Discussion about careers, graduate school, and the future post-STAT440, Focus on Final Project and Final Exam
17 05/08 - 05/12 Final Exam for section 01 on Wednesday May 10 1:30pm-4:30pm, Final Project Post-Feedback Submission due at 11:59 pm Friday May 12

Grading Breakdown

15 Piloted Practices: 10 points total (1 point each)

  • PP01-PP16 are weekly comprehensive assignments due on Mondays, but only 10 of 15 are required to be completed. No student can earn more than 10 points for these assignments. Tokens are inadmissible on this assignment. No one receives credit for PP01.

2 Lab Assignments: 8 points total (4 points each)

  • Lab01 series begins Week 03 (February 01) and ends Week 06 (February 24). Each student will be an in-person in-class driver only once in this series. Tokens are admissible on this assignment, but must be used within one week of original deadline.
  • Lab02 series begins Week 08 (March 08) and ends Week 12 (April 07). Each student will be an in-person in-class driver only once in this series. Tokens are admissible on this assignment, but must be used within one week of original deadline.

2 Midterm Exams: 20 points total (10 points each)

  • Midterm Exam 01 is a one-day in-person in-class exam (March 01). It contains conceptual and applied problems. Tokens are admissible on this assignment, but must be used within one week of original deadline.
  • Midterm Exam 02 is a one-day in-person in-class exam (April 12). It contains conceptual and applied problems. Tokens are admissible on this assignment, but must be used within one week of original deadline.

1 Final Project Pre-Feedback Submission: 13 points total

  • Deadline for .R file pre-feedback submission into project repo is 11:59 pm Friday April 28. Grading rubric provided in course_content repo. Tokens are inadmissible on this assignment.

1 Final Project Post-Feedback Submission: 4 points total

  • Deadline for R file post-feedback submission into project repo is 11:59 pm Friday May 12. Grading rubric provided in course_content repo. Tokens are inadmissible on this assignment.

1 Final Exam: 20 points total

  • At-home exam with a three-hour period which is in alignment with University Final Exam Schedule (see https://registrar.illinois.edu/courses-grades/final-exam-schedule-public/ for more information). Adjust your schedules accordingly. Contains conceptual and applied problems. Tokens are inadmissible on this assignment.

Course Total Points: 75 points

Potential for Bonus Points: Of the four tokens given to each student, if a student physically returns their unused token(s) at the end of the semester, then 1 point per token is added to overall grade. Read more about tokens in the Instructional Activities section below.

Final Letter Grades

When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There is only one $+$ letter grade in this course. All other letter grades are without $+/-$. Points are not rounded.

Lower bound Upper bound Letter grade
75.000 points 79.000 points A+
67.500 points 74.999 points A
60.000 points 67.499 points B
52.500 points 59.999 points C
45.000 points 52.499 points D
0.000 points 44.999 points F

Instructional Activities

Students should read the course notes, watch the lecture videos, and attempt the assignments. If or when students get stuck, then they should ask questions in the i) Issues Board, ii) Office Hours, or iii) via email (preference in this order). In addition to lecture videos and office hours, the following activities and tools will be useful for students.

Course Notes

The course notes perform the duty of a textbook for this course. Yes, there is a lot of information in the notes, but it is useful to read it for the important parts and return to it for details after attempting the assignments.

Issues Tab

This discussion board, which exists as a tab on the course_content repository, is one of the best ways to communicate with classmates and course staff. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.

Do use the board to openly discuss ideas about the course such as questions about content, deadlines, notes, data, etc. If a student specifically wants the course staff to respond, then student should use the mention @staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to kinson2@illinois.edu. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.

The course staff will view and respond to content on Tuesdays and Thursdays (at a minimum). The course staff can be expected to spend at least 30 minutes on Tuesdays and Thursdays monitoring the Issues.

Tokens

Tokens will be used this semester as limited second-chance opportunities to complete assignments without severe penalties. These tokens are intended to help students who could benefit from more opportunities to improve their learning. Each student will be given exactly physical four tokens. Each token may be used to reassess for a grade replacement. This means that a student will take a new version of the assignment such that the new grade will replace the original grade. Tokens are admissible on labs and midterm exams, but inadmissible on piloted practices, final project submissions, and the final exam. One token may be used for a single assignment, but not repeatedly. For example, a student may use one token to reassess on lab01 but not another token to reassess again on lab01. If a student wants to use a token, they must inform the Instructor and return the token during Instructor’s office hours within 7 days (i.e. 1 week) of the returned grade of their original assignment. At that time, a reassessment will be provided to the student. If a student returns their unused token(s) at the end of the semester, then 1 point per token is added to overall grade as bonus points. If a student loses any of their tokens, then no new tokens will be given, and that student forfeits that particular token usage. In other words, without a physical token, students may not benefit from reassessments nor bonus points. The Instructor will distribute tokens to students in Week 03. Students without tokens have until the end of Week 04 to ask the Instructor for the tokens. After Week 04, no new tokens will be distributed to students.

Grade Disputes

Regarding grade requests and disputes about problems missed on various assignments, please email the Instructor with your requests and disputes within 7 days (i.e. 1 week) of your grade being returned. The dispute will be forwarded to the course staff (TAs/CAs) who will make the final decision about the dispute.

Assignments

Piloted Practices

These are guided comprehension assignments to be completed by each student as an individual as late as one hour before their class begins on Mondays. There are 15 piloted practices for the semester but only 10 are required for all students. The shortname for these assignments is PP. They are numbered by their corresponding week number. For example, PP04 corresponds to week 04 piloted practice and is due on Monday of Week 04.

These assignments are to be submitted in the student’s repository, which exists within the illinois-stat440 organization in GitHub. See Course Website for web links. These PPs are graded for completion, not correctness. Tokens are inadmissible on these assignments.

Lab Assignments

These are lab sessions that contain 4 problems and are due at the end of the class period on Mondays and Wednesdays by the “driver.” The course is located in an iFLEX classroom, which allows for ease of communication, collaboration, and displaying content on large screens at stations. Each week (beginning in Week 03), randomly chosen students (“drivers”) will complete the lab assignment seated at specified stations, while the remaining students (“passengers”) at the stations will help the drivers by giving them ideas and advice on how to complete the problems. Drivers must submit their labs. Passengers are not allowed to type on driver’s laptops. All students must bring their laptops to class each day. Lab assignments are intended to push students to apply concepts covered in the course notes and lecture videos and encourage students to work together as a team in a limited amount of time. Each student must complete two lab assignments as the driver by the respective lab due dates. A schedule will be give to students displaying exactly when each student is to be driver and which station number the driver should sit. Do not deviate from this schedule. Late submissions of lab assignments will not be accepted. There will be no free make-ups for any missed labs. Tokens may be used to reassess when a student misses a lab. This policy applies to any students who add the course late to their registration. Tokens are admissible on these assignments. Review the Tokens description in above section.

Midterm Exam

The midterms are in-class in-person exams with 10 problems. The exam will be closed notes, closed book. Each midterm file must be submitted on the Course Website via a student’s repo. The midterm exam will be graded for correctness and completeness. Students can expect the midterm exam to include a structure similar to lab assignments with a possible mixture of completion, correct/incorrect, and open-ended questions. Late submissions on any Midterm Exam submission will not be accepted or graded. There will be no make-ups for any missed Midterm Exam submission. Tokens are admissible on these assignments. Review the Tokens description in above section.

Final Exam

The final exam is one at-home exam with 20 problems. The exam will be open notes, open book and take up to 3 hours to complete. The final exam will be graded for correctness and completeness. Students can expect the final exam to include a structure similar to labs and the midterm exams with a possible mixture of completion, correct/incorrect, and open-ended questions. Students should not expect solutions to be provided for the final exam. Late submissions on any Final Exam will not be accepted or graded. There will be no make-ups for any missed Final Exam. Tokens are inadmissible on this assignment.

Final Project

The Final Project in this course is the creation of a single Shiny app or dashboard in R. This Shiny app or dashboard is an opportunity for students to demonstrate the statistical data management concepts covered this semester and apply them along with version control for collaboration. This final project will have two submissions. The Pre-Feedback submission is the group’s well-thought-out and successfully running Shiny app or dashboard. The deadline to submit this .R file is April 28. Students will review the feedback from the course staff and incorporate that into the second submission of the Shiny app or dashboard, called the Post-Feedback submission. The deadline to submit this .R file is May 12. Students should read the accompanying grading rubrics for both submissions. Students are allowed to work together on the final project but only in groups of two. If working in groups, a new repo specific to your group will be created in GitHub to permit collaboration. Ideas and coding must be your own code. Late contributions will not be accepted. There will be no make-ups for any missed Final Project. Tokens are inadmissible on these assignments. If the group dynamic is not suiting you, notify the Instructor immediately and the group will be split. Thus the two students will now work as individuals each with a new topic for the shiny app or dashboard.

For more information about Shiny apps and how to create one, check out the following videos and docs:

  • How to start a Shiny app https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=0m0s

  • Getting started with Shiny dashboards https://rstudio.github.io/shinydashboard/get_started.html

  • Effective Reactive Programming Part 1 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-1-joe-cheng/

  • Effective Reactive Programming Part 2 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-2/

  • Mastering Shiny (textbook). Wickham. https://mastering-shiny.org/

  • Interactive Graphics with Shiny https://resources.rstudio.com/webinars/interactive-graphics-winston

  • Understanding Shiny Modules https://resources.rstudio.com/shiny-developer-conference/shinydevcon-modules-garrettgrolemund-1080p

  • Debugging Techniques https://resources.rstudio.com/shiny-developer-conference/shinydevcon-debugging-jonathanmcpherson-1080p

  • Welcome to Shiny https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/


University Specifics

Disability Accommodations

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, student may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website at http://disability.illinois.edu/

Academic Integrity

It is expected that all students abide by the campus regulations on academic integrity http://studentcode.illinois.edu/article1_part4_1-401.html. Intentional violations of academic integrity can be found at http://studentcode.illinois.edu/article1_part4_1-402.html and include, but are not limited to, copying any part of another student’s assignment and allowing another student to copy any part of student’s own assignment.

Safety Protocol

We have been asked by Public Safety https://police.illinois.edu/emergency-preparedness/run-hide-fight/ to share the following information in case of weather or security emergencies. See the links:

Sexual Misconduct Policy and Reporting

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found at https://wecare.illinois.edu/resources/students/#confidential. Other information about resources and reporting is available at https://wecare.illinois.edu.


The Last Word

The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.