STAT 440 Fall 2021 Information

STAT 440 Statistical Data Management

3 or 4 Credit Hours - Major Elective

Section 1UG/1GR (In-person) & OUG/OGR (Online)

Fall 2021 - Syllabus

Table of Contents


COVID-19 General Information

Following University policy, all students are required to engage in appropriate behavior to protect the health and safety of the community. Students are also required to follow the campus COVID-19 protocols.

Students who feel ill must not come to class. In addition, students who test positive for COVID-19 or have had an exposure that requires testing and/or quarantine must not attend class. The University will provide information to the instructor, in a manner that complies with privacy laws, about students in these latter categories. These students are judged to have excused absences for the class period and should contact the instructor via email about making up the work.

Students who fail to abide by these rules will first be asked to comply; if they refuse, they will be required to leave the classroom immediately. If a student is asked to leave the classroom, the non-compliant student will be judged to have an unexcused absence and reported to the Office for Student Conflict Resolution for disciplinary action. Accumulation of non-compliance complaints against a student may result in dismissal from the University.

Face Coverings

All students, faculty, staff, and visitors are required to wear face coverings in classrooms and university spaces. This is in accordance with CDC guidance and University policy and expected in this class.

Please refer to the University of Illinois Urbana-Champaign’s COVID-19 website for further information on face coverings. Thank you for respecting all of our well-being so we can learn and interact together productively.

Building Access

In order to implement COVID-19-related guidelines and policies affecting university operations, instructional faculty members may ask students in the classroom to show their Building Access Status in the Safer Illinois app or the Boarding Pass. Staff members may ask students in university offices to show their Building Access Status in the Safer Illinois app or the Boarding Pass. If the Building Access Status says “Granted”, that means the individual is compliant with the university’s COVID-19 policies - either with a university-approved COVID-19 vaccine or with the on-campus COVID-19 testing program for unvaccinated students.

Students are required to show only the Building Access Screen, which shows compliance without specifying whether it was through COVID-19 vaccination or regular on-campus testing. To protect personal health information, this screen does not say if a person is vaccinated or not. Students are not required to show anyone the screen that displays their vaccination status. No university official, including faculty members, may ask students why they are not vaccinated or any other questions seeking personal health information.

Course Description

Statistical Data Management (STAT 440) is a focused data wrangling course that aims to cover various types of data storage, manipulation, cleaning, and extractions and to apply these methodologies in a programming language. The expectation is that students will gain competency in exploring, organizing, designing, creating, storing, cleaning, wrangling, sharing, and using data, all of which are commonly done prior to data analysis. Critical and creative thinking and efficient coding will be encouraged. Concepts covered in this course will build upon each other. Thus, students can expect all assessments to be cumulative. Students should be sufficient and comfortable in one of the two programming languages: R or Python. These languages offer reproducible documentation with Markdown syntax which will support long-term learning opportunities. Git propels students’ capacity for collaboration as well as version control of documentation and files.

Because there are two sections of STAT 440 in Fall 2021, readers should assume all information is pertaining to both sections: 1UG/GR (In-person) and OUG/GR (Online). If there are ever differences, note the specific section for that item.


Learning Objectives

These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.

  • Students will assess effectiveness, organization, and intent from a published data set

  • Students will explore data sets of various types

  • Students must design well-organized, clean data sets for the purpose of data analysis

  • Students will present data management work in a reproducible document file using Markdown syntax and code chunks or cells. No local data files will be utilized.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set

  • Students must be able to explain and summarize data wrangling code

  • Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue

  • Students must recall important data management concepts

  • Students will reflect on their own learning of data management principles

  • Students will build data wrangling tools and store all work using git, a version control software.

  • Students will collaborate to create a cohesive data wrangling tool as the final project.


Course Staff

  • Instructor - Christopher Kinson (kinson2@illinois.edu)

  • Teaching Assistant (TA) - Jim Yan (yiciyan2@illinois.edu)

  • Teaching Assistant (TA) - Jaideep Mahajan (jaideep3@illinois.edu)

  • Course Assistant (CA) - TBD (@illinois.edu)

Course Specifics

Course Website

This course is operating as an organization named stat440-fa21 organization within GitHub Enterprise. Students should bookmark or save the link below in their browser for future use.

https://github-dev.cs.illinois.edu/stat440-fa21/stat440-fa21-course-content

Prerequisites

The prerequisites for this course are the following:

  • STAT 400 or STAT 409

  • Knowledge of basic computer concepts such as how to locate a file, create a folder, save a file, zip or compress a file, unzip or extract a .zip file, etc.

  • Sufficiency in either R or Python; Student should pick one language that they are most comfortable with.

Meeting Schedule

  • For section 1UG/GR, there are in-person meetings at 10:00-10:50am in Room 32 of the Psychology building on Mondays, Wednesdays, and Fridays. All students in this section are expected to show up on these days for the first two weeks of the semester. Beginning in the third week, half of the class must arrive to the meeting on Mondays, while the other half must arrive to the meeting on Wednesdays.

  • For section OUG/GR, there are no live, in-person, or synchronous lecture sessions this semester. There will be asynchronous lecture videos recorded in Zoom and posted on the Course Website (within announcements page and in a sub-directory of the stat440-fa21-course-content repository). Because of Zoom’s policies, videos are downloadable but video links will expire in 30 days. See the Instructional Activities section below for more details.

  • All students are expected to read the course notes, watch the lecture videos, complete assignments, and fully participate in class regularly. Course content - announcements, syllabus, notes, videos, homework, exams, projects, and weekly schedules, Issues board - will be found on the Course Website via the stat440-fa21-course-content repository. Do check the Course Website often for updates and announcements. Students are encouraged to pull the stat440-fa21-course-content repository daily if accessing it remotely via git.

Office Hours

Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.

Because of the in-person nature of section 1UG/GR, synchronous in-person office hours will take place. Beginning in the third week, no students are required to arrive to the meeting on Fridays at 10:00-10:50am in Room 32 of the Psychology building. These Fridays will serve as this section’s office hours. These Friday office hours will be optional attendance for students who want to ask questions and to receive feedback from the Instructor. These office hours will not be recorded, but will have a simultaneous Zoom session. It is permissible for students in section OUG/GR to attend these virtual office hours in Zoom. However, section OUG/GR students are not allowed to the in-person office hours on Fridays.

  • Instructor in-person office hours - section 1UG/GR:

    • Fridays 10:00 am - 10:50 am at Room 32 of Psychlogy building (beginning in Week 03; for Zoom session register with this link https://illinois.zoom.us/meeting/register/tZ0lcu-srj8jGdf75Isuj7NfQy_wmvcsDqP6 )

Because of the online nature of section OUG/GR, synchronous virtual office hours will take place. After registering for Instructor’s office hours, students will receive a confirmation email (from Zoom) containing information about joining the office hours session. This registration for office hours is only once per semester. Students are encouraged to attend the office hours, but are not required to attend. After each of the Instructor’s office hours, the recorded office hours video link will be shared on the Course Website via the stat440-fa21-course-content repository. It is permissible for students in section 1UG/GR to attend these virtual office hours in Zoom. Because of Zoom’s policies, videos are downloadable but video links will expire in 30 days.

  • Instructor - Christopher Kinson (kinson2@illinois.edu) virtual office hours in Zoom - section OUG/GR:

    • Wednesdays 1:30 pm - 2:30 pm (register with this link https://illinois.zoom.us/meeting/register/tZMsdOuvrD4oG9XARZmEWEaORUtlmmCsHb_N)

    • Thursdays 10:00 am - 11:00 am (register with this link https://illinois.zoom.us/meeting/register/tZYtfuCsqD0qE9Hta2_rusMW6_fqcdYMFvvG)

  • TA - Jim Yan (yiciyan2@illinois.edu) virtual office hours in Zoom:

    • Mondays 7:00 pm - 9:00 pm

    • Fridays 7:00 pm - 9:00 pm

    • Zoom link (no registration required) https://illinois.zoom.us/j/88144449419?pwd=bUFmM0pJQzJYeUJDa3kzZHlNSXgyUT09

      • Meeting ID: 881 4444 9419

      • Password: 996636

  • TA - Jaideep Mahajan (jaideep3@illinois.edu) virtual office hours in Zoom:

    • Mondays 5:00 pm - 7:00 pm

    • Fridays 5:00 pm - 7:00 pm

    • Zoom link (no registration required) https://illinois.zoom.us/j/88144449419?pwd=bUFmM0pJQzJYeUJDa3kzZHlNSXgyUT09

      • Meeting ID: 881 4444 9419

      • Password: 996636

  • If a student has a specific question, but cannot attend the office hours, then they should post their question in the Issues board.

  • If a student wants one-on-one assistance from the course staff, then the student should email the course staff in order to schedule a Zoom meeting.

Textbooks

There is no required textbook, but students may find the texts below to be helpful. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library as E-books.

R

  • *Data Wrangling with R. Boehmke. Springer, Cham. http://www.library.illinois.edu/proxy/go.php?url=http://dx.doi.org/10.1007/978-3-319-45599-0
  • *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://learning.oreilly.com/library/view/-/9781491910382/?ar

Python

  • *Data Wrangling with Python. Kazil and Jarmul. O’Reilly Media, Inc. https://learning.oreilly.com/library/view/-/9781491948804/?ar
  • *Data Wrangling with Python. Sarkar and Roychowdhury. Packt Publishing. https://learning.oreilly.com/library/view/-/9781789800111/?ar

Markdown

  • The Markdown Guide. Cone. https://www.markdownguide.org/getting-started

RMarkdown

  • RMarkdown Cheat Sheet. RStudio. https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Jupyter Lab/Notebook

  • Jupyter Notebook Cheat Sheet. DataCamp. https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Jupyter_Notebook_Cheat_Sheet.pdf

Git and GitHub

  • Happy Git and GitHub for the useR. Bryan et al. https://happygitwithr.com/

Software

The course requires students to already have a fundamental and operational understanding of one of the two programming languages: R or Python. It is recommended that students with no familiarity in either of these be aware that this course will not discuss fundamental and operational usage of these programming languages. In other words, students should use the one that they are most comfortable with for the duration of the semester.

After completing these 3 steps for the Git and GitHub Enterprise section, student’s own netID will appear in the stat440-fa21 course. Then, the Instructor will add student to the “students” team for the main stat440-fa21-course-content repo so that student can see all necessary course content and announcements. Student should receive an email from GitHub Enterprise inviting them to the “students” team.

  • Zoom video teleconferencing software with functioning Webcam and Microphone https://illinois.zoom.us/

Calendar

Below is a calendar of topics and tentative assignment deadlines.

Week 2021 Dates Topics Covered
01 08/23 - 08/27 Introduce the course and software - Markdown, R/RStudio/RMarkdown, Python/Jupyter Lab, Git/GitHub
02 08/30 - 09/03 Git/GitHub tips, What is data, Structures of data, delimiters, and file extensions, Accessing and importing data
03 09/06 - 09/10 Dates, times, and character formats, Web scraping for accessing and importing data, Assigning objects
04 09/13 - 09/17 Homework01 due September 13, 2021 at 11:59 pm, Lab01/Challenge01 due September 15, 2021 at 10:50 am, Arranging data, Reshaping data
05 09/20 - 09/24 Filtering and selecting data, Mutating data
06 09/27 - 10/01 Homework02 due September 27, 2021 at 11:59 pm, Loops and conditional execution, Apply family of functions, Vectorization
07 10/04 - 10/08 Regular expression and string manipulation
08 10/11 - 10/15 Homework03 due October 11, 2021 at 11:59 pm, , Lab02/Challenge02 due October 13, 2021 at 10:50 am, Focus on Midterm Exam
09 10/18 - 10/22 Midterm Exam due October 18, 2021 at 11:59 pm, Validating data, Cleaning data
10 10/25 - 10/29 Data expansion, Data reduction, Summarizing data, Combining data
11 11/01 - 11/05 Homework04 due November 01, 2021 at 11:59 pm, SQL for weeks 03-07 content
12 11/08 - 11/12 SQL for weeks 09-10 content, SQL sub-queries
13 11/15 - 11/19 Homework05 due November 15, 2021 at 11:59 pm, , Lab01/Challenge03 due November 17, 2021 at 10:50 am, Tool-making and programming
14 11/22 - 11/26 Fall Break
15 11/29 - 12/03 Data usage, data jobs, data issues, data research
16 12/06 - 12/10 Reading Day - December 09, 2021 (no class and no office hours), Focus on Final Project and Final Exam
17 12/13 - 12/17 Final Project due no later than December 20, 2021 at 11:59 pm, Final Exam accessible on December 12 at 11:59 pm and due no later than December 14 at 11:59 pm, Grades to Be Submitted

Grading Breakdown

3 Lab Assignments: 30 points total (section 1UG/GR only)

  • Lab01 (due September 15 at 10:50 am): 10 points
  • Lab02 (due October 13 at 10:50 am): 10 points
  • Lab03 (due November 17 at 10:50 am): 10 points

3 Challenge Assignments: 30 points total (section OUG/GR only)

  • Challenge01 (due September 15 at 11:59 pm): 10 points
  • Challenge02 (due October 13 at 11:59 pm): 10 points
  • Challenge03 (due November 17 at 11:59 pm): 10 points

6 Homework Assignments: 100 points total

  • Homework01 (due September 13 at 11:59 pm): 20 points
  • Homework02 (due September 27 at 11:59 pm): 20 points
  • Homework03 (due October 11 at 11:59 pm): 20 points
  • Homework04 (due November 01 at 11:59 pm): 20 points
  • Homework05 (due November 15 at 11:59 pm): 20 points

1 Midterm Exam: 30 points total

  • At-home exam (due October 18 at 11:59 pm): 30 points

1 Final Project: 40 points total

  • 40 points to make 1 valid Shiny app contribution (due no later than December 20 at 11:59 pm)

1 Final Exam: 40 points total

  • 40 points (accessible on December 12 at 11:59 pm and due no later than December 14 at 11:59 pm)

Course Total Points: 240 points

Final Letter Grades

When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There are no $+/-$ letter grades in this course.

Lower bound Upper bound Letter grade
216 points 240 points A
192 points 215 points B
168 points 191 points C
144 points 167 points D
0 points 143 points F

Instructional Activities

Students should read the course notes, watch the lecture videos, and attempt the assignments. If or when students get stuck, then they should ask questions in the i) Issues Board, ii) Office Hours, or iii) via email (preference in this order). In addition to lecture videos and office hours, the following activities and tools will be useful for students.

Course Notes

The course notes perform the duty of a textbook for this course. Yes, there is a lot of information in the notes, but it is useful to read it for the important parts and return to it for details after attempting the assignments.

Announcements

The course announcements page is the landing page within the stat440-fa21-course-content repository. This page is a rendering of the README.md file. In this announcements page, students can find a proposed weekly schedule, links to files, information about assignment deadlines, and other important course reminders.

Issues Board

This discussion board, which exists as a tab on the stat440-fa21-course-content repository, is one of the best ways to communicate with classmates and course staff. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.

Do use the board to openly discuss ideas about the course as frequently as student likes such as questions about content, deadlines, homework, notes, data, etc. If student specifically wants the course staff to respond, then student should use the mention @stat440-staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to kinson2@illinois.edu. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.

The course staff will view and respond to content on Tuesdays and Thursdays (at a minimum). The course staff can be expected to spend at least 30 minutes on Tuesdays and Thursdays monitoring the Issues board.

Assignments

Lab Assignments (section 1UG/GR only)

These are lab sessions that contain problems for the in-person section only and are due at the end of class. The course location is Room 32 of Psychology building, an iFLEX classroom, which allows for ease of communication, collaboration, and displaying content on large screens at stations. Each week, randomly chosen students (“drivers”) will complete the lab assignment, while the remaining students (“passengers”) at the stations will help the drivers. From week to week, the lab assignment changes, but students should expect the difficulty for each week’s lab assignment to be roughly similar. Lab assignments are intended to push students to apply concepts covered in the course notes and challenge students to work together as a team. Each student must complete three lab assignments as the driver by their respective due dates. Passengers are rewarded for being present and assisting their station’s driver. More details about the lab assignments will be discussed in the first two weeks of section 1UG/GR.

Challenge Assignments (section OUG/GR only)

These are problems for the online section only. The challenge assignments will be the same for each student and must be submitted on the Course Website by the respective deadlines. These problems are meant to cover several weeks of statistical data management concepts - thereby challenging students on what information they recall and retain. Some questions may require outside-of-the-box thinking, and others may be open-ended. More details about the challenge assignments will be discussed as the deadlines approach.

Homework Assignments

These are assignments to be completed by each student as an individual. There will be 5 homework assignments. Students have roughly two weeks to complete and submit each homework. Graduate students are expected to complete all 10 problems, while Undergraduate students are expected to complete the first 8 problems. Since each assignment is worth 20 points, undergraduate students receive an automatic 4 points if submitted before the homework solutions are posted. All assigned homework must be submitted on the Course Website by its due date and time. The homework is required for all students.

“Submitted on the Course Website” means committing and pushing file(s) in the student’s own individual repository (or repo for short) within the stat440-fa21 organization in GitHub Enterprise.

Students are encouraged to discuss the homework in the Issues board. However, sharing or copying any part of the homework (including solutions) is an infraction of the University’s rules on Academic Integrity. See the syllabus Student Code for more information. The homework problems will be graded for correctness and completeness.

Homework solutions will be shared with students 48 hours after the homework deadline (unless there is an active DRES deadline accommodation). When the homework solution is shared with students, no new submissions for that homework assignment will be accepted. Instead, any assignment submitted after the homework solutions have been posted will receive an automatic grade of 10 points. This policy above also applies to any students who add the course late to their registration.

Late submissions of homework will be accepted, so long as the assignment is submitted before the homework solution is posted. The penalty for any late homework submission is a 4 point deduction.

There will be no make-ups for any missed homework. This policy also applies to any students who add the course late to their registration.

Students can expect that homework will be graded and returned to students within 14 days of the homework deadline.

Midterm Exam

The midterm is one at-home exam. The exam will be (open notes, open book) and take up to 7 days to complete. The midterm must be submitted on the Course Website. The midterm exam will be graded for correctness and completeness.

Late submissions on any Midterm Exam will not be accepted or graded. There will be no make-ups for any missed Midterm Exam.

Students can expect the midterm exam to include some, if not all, open-ended questions with no clearly correct solutions. Thus, there will be no solution provided for open-ended questions.

Final Project

The Final Project in this course is the creation of a single Shiny app in R that each student must have contributed to the creation of. The final Shiny app will need to be completed (production and testing it out) and successfully working by 11:59 pm Monday December 20, 2021. The collaboration will take place via Git (and GitHub). Each student is expected to make 1 contribution (ideas plus code plus committing and pushing those ideas/code via a pull request) to the app.R file. The contribution is worth 40 points. This Shiny app is an opportunity for students to demonstrate the statistical data management concepts they have learned this semester and apply them along with version control for collaboration. Your ideas and coding must be your own code. Details about the Shiny app will be discussed later this semester.

For more information about Shiny apps and how to create one, check out the following videos and docs:

  • How to start a Shiny app https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=0m0s

  • Effective Reactive Programming Part 1 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-1-joe-cheng/

  • Effective Reactive Programming Part 2 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-2/

  • Interactive Graphics with Shiny https://resources.rstudio.com/webinars/interactive-graphics-winston

  • Understanding Shiny Modules https://resources.rstudio.com/shiny-developer-conference/shinydevcon-modules-garrettgrolemund-1080p

  • Debugging Techniques https://resources.rstudio.com/shiny-developer-conference/shinydevcon-debugging-jonathanmcpherson-1080p

  • Welcome to Shiny https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/

Late contributions will not be accepted. There will be no make-ups for any missed Final Project.

Final Exam

The final exam is one at-home exam. The exam will be (open notes, open book) and take up to 2 days to complete. The final exam will be graded for correctness and completeness.

Late submissions on any Final Exam will not be accepted or graded. There will be no make-ups for any missed Final Exam.

Students can expect the final exam to include a structure similar to homework, but a mixture of completion, correct/incorrect, and open-ended questions. Students should not expect solutions to be provided for the final exam.


University Specifics

Disability Accommodations

To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, student may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website at http://disability.illinois.edu/

Academic Integrity

It is expected that all students abide by the campus regulations on academic integrity http://studentcode.illinois.edu/article1_part4_1-401.html. Intentional violations of academic integrity can be found at http://studentcode.illinois.edu/article1_part4_1-402.html and include, but are not limited to, copying any part of another student’s assignment and allowing another student to copy any part of student’s own assignment.

Safety Protocol

We have been asked by Public Safety https://police.illinois.edu/emergency-preparedness/run-hide-fight/ to share the following information in case of weather or security emergencies. See the links:

Sexual Misconduct Policy and Reporting

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found at https://wecare.illinois.edu/resources/students/#confidential. Other information about resources and reporting is available at https://wecare.illinois.edu.


The Last Word

The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.