STAT 440 Spring 2022 Information

STAT 440 Statistical Data Management

3 or 4 Credit Hours - Major Elective

Section 1UG/1GR & 2UG/2GR

Spring 2022 - Syllabus

Table of Contents


COVID-19 General Information

Following University policy, all students are required to engage in appropriate behavior to protect the health and safety of the community. Students are also required to follow the campus COVID-19 protocols.

Students who feel ill must not come to class. In addition, students who test positive for COVID-19 or have had an exposure that requires testing and/or quarantine must not attend class. The University will provide information to the instructor, in a manner that complies with privacy laws, about students in these latter categories. These students are judged to have excused absences for the class period and should contact the instructor via email about making up the work.

Students who fail to abide by these rules will first be asked to comply; if they refuse, they will be required to leave the classroom immediately. If a student is asked to leave the classroom, the non-compliant student will be judged to have an unexcused absence and reported to the Office for Student Conflict Resolution for disciplinary action. Accumulation of non-compliance complaints against a student may result in dismissal from the University.

Face Coverings

All students, faculty, staff, and visitors are required to wear face coverings in classrooms and university spaces. This is in accordance with CDC guidance and University policy and expected in this class.

Please refer to the University of Illinois Urbana-Champaign’s COVID-19 website for further information on face coverings. Thank you for respecting all of our well-being so we can learn and interact together productively.

Building Access

In order to implement COVID-19-related guidelines and policies affecting university operations, instructional faculty members may ask students in the classroom to show their Building Access Status in the Illinois app or the Boarding Pass. Staff members may ask students in university offices to show their Building Access Status in the Illinois app or the Boarding Pass. If the Building Access Status says “Granted”, that means the individual is compliant with the university’s COVID-19 policies - either with a university-approved COVID-19 vaccine or with the on-campus COVID-19 testing program for unvaccinated students.

Students are required to show only the Building Access Screen, which shows compliance without specifying whether it was through COVID-19 vaccination or regular on-campus testing. To protect personal health information, this screen does not say if a person is vaccinated or not. Students are not required to show anyone the screen that displays their vaccination status. No university official, including faculty members, may ask students why they are not vaccinated or any other questions seeking personal health information.

Course Description

Statistical Data Management (STAT 440) is a focused data wrangling course that aims to cover various types of data storage, manipulation, cleaning, and extractions and to apply these methodologies in R. This means that students must have a laptop that they bring to class each day. The expectation is that students will gain competency in exploring, organizing, designing, creating, storing, cleaning, wrangling, sharing, and using data, all of which are commonly done prior to data analysis. Critical and creative thinking and efficient coding will be encouraged. Concepts covered in this course will build upon each other. Thus, students can expect all assessments to be cumulative. Students should be sufficient and comfortable in R prior to beginning this course. These languages offer reproducible documentation with Markdown syntax which will support long-term learning opportunities. Git propels students’ capacity for collaboration as well as version control of documentation and files.

Because there are two sections of STAT 440 in Spring 2022, readers should assume all information is pertaining to both sections: 1UG/GR and 2UG/GR. If there are ever differences, note the specific section for that item.


Learning Objectives

These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.

  • Students will assess effectiveness, organization, and intent from a published data set

  • Students will explore data sets of various types

  • Students must design well-organized, clean data sets for the purpose of data analysis

  • Students will present data management work in a reproducible document file using Markdown syntax and code chunks or cells. No local data files will be utilized.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set

  • Students must be able to explain and summarize data wrangling code

  • Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue

  • Students must recall important data management concepts

  • Students will reflect on their own learning of data management principles

  • Students will build data wrangling tools, apps, and dashboards and store all work using git, a version control software.

  • Students will collaborate on lab assignments.

  • Students will reproduce and replicate data visualizations.

  • Students will assess and critique published visualizations.


Course Staff

  • Instructor - Christopher Kinson (kinson2@illinois.edu)

  • Teaching Assistant (TA) - Jim Yan (yiciyan2@illinois.edu)

  • Teaching Assistant (TA) - Jaideep Mahajan (jaideep3@illinois.edu)

  • Course Assistant (CA) - TBD (@illinois.edu)

Course Specifics

Course Website

This course is operating as an organization named stat440-sp22 within GitHub Enterprise. Students should bookmark or save the link below in their browser for future use,because it contains all course materials including notes, assignments, and lecture videos.

https://github-dev.cs.illinois.edu/stat440-sp22/course-content

Prerequisites

The prerequisites for this course are the following:

  • STAT 400 or STAT 409

  • Knowledge of basic computer concepts such as how to locate a file, create a folder, save a file, zip or compress a file, unzip or extract a .zip file, etc.

  • Sufficiency (i.e. a fundamental and operational understanding) in R

Meeting Schedule

  • For section 1UG/GR, there are in-person meetings at 11:00 am - 11:50 am in Room 131 of the Animal Sciences Laboratory building on Mondays, Wednesdays, and Fridays.

  • For section 2UG/GR, there are in-person meetings at 04:00 pm - 04:50 pm in Room 32 of the Psychology building on Mondays, Wednesdays, and Fridays.

  • There will be asynchronous lecture videos recorded in Zoom and posted on the Course Website (within announcements page and in a sub-directory of the stat440-sp22-course-content repository). Because of Zoom’s policies, videos are downloadable but video links will expire in 30 days. See the Instructional Activities section below for more details.

  • All students are expected to read the course notes, watch the lecture videos, complete assignments, and fully participate in class regularly. Course content - announcements, syllabus, notes, videos, homework, exams, projects, and weekly schedules, Issues board - will be found on the Course Website via the stat440-sp22-course-content repository. Do check the Course Website often for updates and announcements. Students are encouraged to pull the stat440-sp22-course-content repository daily if accessing it remotely via git.

Office Hours

Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.

Synchronous in-person office hours will take place. Beginning in the third week, no students are required to arrive to class on Fridays. These Fridays will serve as that section’s office hours. These Friday office hours will be optional attendance for students who want to ask questions and to receive feedback from the Instructor. These office hours will not be recorded.

  • Instructor in-person office hours - section 1UG/GR:

    • Fridays 11:00 am - 11:50 am at Room 131 of the Animal Sciences Laboratory building (beginning in Week 03)
  • Instructor in-person office hours - section 2UG/GR:

    • Fridays 04:00 pm - 04:50 pm at Room 32 of Psychology building (beginning in Week 03)

Synchronous virtual office hours will also take place. After registering for Instructor’s office hours, students will receive a confirmation email (from Zoom) containing information about joining the office hours session. This registration for Instructor’s office hours is only once per semester. Students are encouraged to attend the office hours, but are not required to attend. After each of the Instructor’s office hours, the recorded office hours video link will be shared on the Course Website via the stat440-sp22-course-content repository. Because of Zoom’s policies, videos are downloadable but video links will expire in 30 days.

  • Instructor - Christopher Kinson (kinson2@illinois.edu) virtual office hours in Zoom:

    • Thursdays 04:00 pm - 05:00 pm (register with this link https://illinois.zoom.us/meeting/register/tZYscOirrT0qGN36oIKAvqF0z0g0cqzo1QHB)

    • Fridays 01:30 pm - 02:30 pm (register with this link https://illinois.zoom.us/meeting/register/tZYkduqvqD0tHtUP0mYmxEqokKCDsY9xuOFp)

  • TA - Jim Yan (yiciyan2@illinois.edu) virtual office hours in Zoom:

    • Mondays 07:00 pm - 09:00 pm

    • Fridays 07:00 pm - 09:00 pm

    • Zoom link (no registration required) https://illinois.zoom.us/j/88144449419?pwd=bUFmM0pJQzJYeUJDa3kzZHlNSXgyUT09

      • Meeting ID: 881 4444 9419

      • Password: 996636

  • TA - Jaideep Mahajan (jaideep3@illinois.edu) virtual office hours in Zoom:

    • Mondays 05:00 pm - 07:00 pm

    • Fridays 05:00 pm - 07:00 pm

    • Zoom link (no registration required) https://illinois.zoom.us/j/88144449419?pwd=bUFmM0pJQzJYeUJDa3kzZHlNSXgyUT09

      • Meeting ID: 881 4444 9419

      • Password: 996636

  • If a student has a specific question, but cannot attend the office hours, then they should post their question in the Issues board.

  • If a student wants one-on-one assistance from the course staff, then the student should email the course staff in order to schedule a Zoom meeting.

Textbooks

There is no required textbook, but students may find the texts below to be helpful. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library as E-books.

R

  • *Data Wrangling with R. Boehmke. Springer, Cham. http://www.library.illinois.edu/proxy/go.php?url=http://dx.doi.org/10.1007/978-3-319-45599-0
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://learning.oreilly.com/library/view/-/9781491910382/?ar

Markdown

  • The Markdown Guide. Cone. https://www.markdownguide.org/getting-started

RMarkdown

  • RMarkdown Cheat Sheet. RStudio. https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Git and GitHub

  • Happy Git and GitHub for the useR. Bryan et al. https://happygitwithr.com/

Software

The course requires students to already have a fundamental and operational understanding of R. It is recommended that students with no familiarity in R understand that this course will not discuss fundamental and operational usage of R.

Calendar

Below is a calendar of topics and tentative assignment deadlines.

Week 2022 Dates Topics Covered
01 01/17 - 01/21 Introduce the course and software - Markdown, R/RStudio/RMarkdown, Git/GitHub, Git/GitHub tips
02 01/24 - 01/28 What is data, Structures of data, delimiters, and file extensions, Accessing and importing data, Exporting data, Handling dates and times
03 01/31 - 02/04 Assigning objects, Accessing and importing data via Web scraping, Accessing and importing data via an API
04 02/07 - 02/11 Homework01 due February 07, 2022 at 11:59 pm, Arranging data, Reshaping data, Data expansion, Data reduction
05 02/14 - 02/18 Loops and conditional execution, Apply family of functions, Vectorization
06 02/21 - 02/25 Homework02 due February 21, 2022 at 11:59 pm, Lab01 due February 16, 2022, Regular expression and string manipulation, Summarizing data, Combining data
07 02/28 - 03/04 Validating data, Cleaning data
08 03/07 - 03/11 Homework03 due March 07, 2022 at 11:59 pm, Focus on Midterm Exam, Midterm Exam due March 11, 2022 at 11:59 pm
09 03/14 - 03/18 Spring Break
10 03/21 - 03/25 SQL for weeks 03-07 content, SQL sub-queries
11 03/28 - 04/01 Lab02 due March 30, 2022, Tool-making and programming, Data visualization Part01
12 04/04 - 04/08 Homework04 due April 04, 2022 at 11:59 pm, Data visualization Part02-03
13 04/11 - 04/15 Shiny apps and dashboards
14 04/18 - 04/22 Homework05 due April 18, 2022 at 11:59 pm, DevOps introduction
15 04/25 - 04/29 Lab03 due April 27, 2022, Data usage, data jobs, data issues, data research
16 05/02 - 05/06 Reading Day - May 05, 2022 (no class and no office hours), Final Project due no later than May 06, 2022 at 11:59 pm, Focus on Final Project and Final Exam
17 05/09 - 05/13 Final Exam accessible on May 08, 2022 at 11:59 pm and due no later than May 10, 2022 at 11:59 pm, Grades to Be Submitted

Grading Breakdown

3 Lab Assignments: 30 points total

  • Lab01 (due February 16, 2022 at the end of the class period): 10 points
  • Lab02 (due March 30, 2022 at the end of the class period): 10 points
  • Lab03 (due April 27, 2022 at the end of the class period): 10 points

6 Homework Assignments: 100 points total

  • Homework01 (due February 07, 2022 at 11:59 pm): 20 points
  • Homework02 (due February 21, 2022 at 11:59 pm): 20 points
  • Homework03 (due March 07, 2022 at 11:59 pm): 20 points
  • Homework04 (due April 04, 2022 at 11:59 pm): 20 points
  • Homework05 (due April 18, 2022 at 11:59 pm): 20 points

1 Midterm Exam: 30 points total

  • At-home exam (accessible on March 09, 2022 at 11:59 pm and due no later than March 11, 2022 at 11:59 pm): 30 points

1 Final Project: 40 points total

  • 40 points to make a Shiny dashboard (due no later than May 06, 2022 at 11:59 pm)

1 Final Exam: 40 points total

  • 40 points (accessible on May 08, 2022 at 11:59 pm and due no later than May 10, 2022 at 11:59 pm)

Course Total Points: 240 points

Final Letter Grades

When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There are no $+/-$ letter grades in this course.

Lower bound Upper bound Letter grade
216 points 240 points A
192 points 215 points B
168 points 191 points C
144 points 167 points D
0 points 143 points F

Instructional Activities

Students should read the course notes, watch the lecture videos, and attempt the assignments. If or when students get stuck, then they should ask questions in the i) Issues Board, ii) Office Hours, or iii) via email (preference in this order). In addition to lecture videos and office hours, the following activities and tools will be useful for students.

Course Notes

The course notes perform the duty of a textbook for this course. Yes, there is a lot of information in the notes, but it is useful to read it for the important parts and return to it for details after attempting the assignments.

Announcements

The course announcements page is the landing page within the stat440-sp22-course-content repository. This page is a rendering of the README.md file. In this announcements page, students can find a proposed weekly schedule, links to files, information about assignment deadlines, and other important course reminders.

Issues Board

This discussion board, which exists as a tab on the stat440-sp22-course-content repository, is one of the best ways to communicate with classmates and course staff. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.

Do use the board to openly discuss ideas about the course as frequently as student likes such as questions about content, deadlines, homework, notes, data, etc. If student specifically wants the course staff to respond, then student should use the mention @staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to kinson2@illinois.edu. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.

The course staff will view and respond to content on Tuesdays and Thursdays (at a minimum). The course staff can be expected to spend at least 30 minutes on Tuesdays and Thursdays monitoring the Issues board.

Assignments

Lab Assignments

These are lab sessions that contain 5 problems and are due at the end of the class period by the “driver.” The course is located in an iFLEX classroom, which allows for ease of communication, collaboration, and displaying content on large screens at stations. Each week (beginning in Week 03), randomly chosen students (“drivers”) will complete the lab assignment, while the remaining students (“passengers”) at the stations will help the drivers by giving them ideas and advice on how to complete the problems. Students must bring their laptops to class each day. From week to week, the lab assignment changes, but students should expect the difficulty for each week’s lab assignment to be roughly similar. Lab assignments are intended to push students to apply concepts covered in the course notes and lecture videos and challenge students to work together as a team. Each student must complete three lab assignments as the driver by the respective lab due dates. Passengers are rewarded bonus points for being present and assisting their station’s driver. Late submissions of lab assignments will be accepted with a 2-point deduction, so long as the assignment is submitted before lab solution is posted. More details about the lab assignments will be discussed in the first two weeks of class.

Homework Assignments

These are assignments to be completed by each student as an individual. There will be 5 homework assignments. Students have roughly two weeks to complete and submit each homework. Graduate students are expected to complete all 10 problems, while Undergraduate students are expected to complete the first 8 problems. Since each assignment is worth 20 points, undergraduate students receive an automatic 4 points if submitted before the homework solutions are posted. All assigned homework must be submitted on the Course Website by its due date and time. The homework is required for all students.

“Submitted on the Course Website” means committing and pushing both .Rmd and .html files in the student’s own individual repository (or repo for short) within the stat440-sp22 organization in GitHub Enterprise.

Students are encouraged to discuss the homework in the Issues board. However, sharing or copying any part of the homework (including solutions) is an infraction of the University’s rules on Academic Integrity. See the syllabus Student Code for more information. The homework problems will be graded for correctness and completeness.

Students can expect that homework will be graded and returned to students within 14 days of the homework deadline. Homework solutions will be shared with students no sooner than 48 hours after the homework deadline. Late submissions of homework will be accepted with a 4-point deduction, so long as the assignment is submitted before the homework solution is posted. When the homework solution is shared with students, no new submissions for that homework assignment will be accepted. Instead, any assignment submitted after the homework solutions have been posted will receive an automatic grade of 10 points. There will be no make-ups for any missed homework. This policy applies to any students who add the course late to their registration.

Midterm Exam

The midterm is one at-home exam with 15 problems. The exam will be (open notes, open book) and take up to 2 days to complete. The midterm must be submitted on the Course Website via a student’s repo. The midterm exam will be graded for correctness and completeness. Students can expect the final exam to include a structure similar to homework with a mixture of completion, correct/incorrect, and open-ended questions. Late submissions on any Midterm Exam will not be accepted or graded. There will be no make-ups for any missed Midterm Exam.

Final Project

The Final Project in this course is the creation of a single Shiny app or dashboard in R. The final Shiny app or dashboard will need to be completed and successfully working by 11:59 pm Friday May 06, 2022. Students are not allowed to work together on the final project. This Shiny app is an opportunity for students to demonstrate the statistical data management concepts - including tool-making, accessing and importing data, cleaning and validating data, and data visualization - that they have learned this semester and apply them along with version control for collaboration. Your ideas and coding must be your own code. Details about the Shiny app will be discussed later this semester. Late contributions will not be accepted. There will be no make-ups for any missed Final Project.

For more information about Shiny apps and how to create one, check out the following videos and docs:

  • How to start a Shiny app https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=0m0s

  • Getting started with Shiny dashboards https://rstudio.github.io/shinydashboard/get_started.html

  • Effective Reactive Programming Part 1 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-1-joe-cheng/

  • Effective Reactive Programming Part 2 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-2/

  • Interactive Graphics with Shiny https://resources.rstudio.com/webinars/interactive-graphics-winston

  • Understanding Shiny Modules https://resources.rstudio.com/shiny-developer-conference/shinydevcon-modules-garrettgrolemund-1080p

  • Debugging Techniques https://resources.rstudio.com/shiny-developer-conference/shinydevcon-debugging-jonathanmcpherson-1080p

  • Welcome to Shiny https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/

Final Exam

The final exam is one at-home exam with 20 problems. The exam will be (open notes, open book) and take up to 2 days to complete. The final exam will be graded for correctness and completeness. Students can expect the final exam to include a structure similar to homework and the midterm exam with a mixture of completion, correct/incorrect, and open-ended questions. Students should not expect solutions to be provided for the final exam. Late submissions on any Final Exam will not be accepted or graded. There will be no make-ups for any missed Final Exam.


University Specifics

Disability Accommodations

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, student may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website at http://disability.illinois.edu/

Academic Integrity

It is expected that all students abide by the campus regulations on academic integrity http://studentcode.illinois.edu/article1_part4_1-401.html. Intentional violations of academic integrity can be found at http://studentcode.illinois.edu/article1_part4_1-402.html and include, but are not limited to, copying any part of another student’s assignment and allowing another student to copy any part of student’s own assignment.

Safety Protocol

We have been asked by Public Safety https://police.illinois.edu/emergency-preparedness/run-hide-fight/ to share the following information in case of weather or security emergencies. See the links:

Sexual Misconduct Policy and Reporting

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found at https://wecare.illinois.edu/resources/students/#confidential. Other information about resources and reporting is available at https://wecare.illinois.edu.


The Last Word

The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.