STAT 440 Fall 2025 Syllabus

STAT 440 Statistical Data Management

3/4 Credit Hours - Major Elective

Sections 1UG/1GR and 2UG/2GR

Fall 2025 - Syllabus

Table of Contents


Initial Course Setup

This course uses GitHub, not Canvas, for all content and course management. To get setup in the course in GitHub, skip down to the Software section (below) and complete the three steps under Git and GitHub. If you already have Git installed on your computer (most Mac OS systems are pre-installed with Git), then skip that step. If you already have a GitHub account, then you can log in directly. All students must complete step three which puts you in the course space to have access to materials and any updates about the course.


Course Description

Statistical Data Management (STAT 440) is a focused data wrangling course that aims to cover various types of data storage, extractions, manipulation, cleaning, and visualization and to apply these methodologies in R. This means that students must have a laptop that they bring to class each day. This course includes lectures, archived lecture videos, notes, readings, and assessments in both digital and paper formats. Notes are reproducible documents that include R code examples. Videos and notes should be read and understood outside of class, ideally before each class meeting. The classroom space and time will be for practice, application, and assessment of course content. The expectation is that students will gain competency in exploring, organizing, designing, creating, storing, cleaning, wrangling, sharing, visualizing, and using data, all of which are commonly done prior to data analysis. Creative and critical thinking and efficient coding will be encouraged. Concepts covered in this course will build upon each other. Thus, students can expect all assessments to be cumulative in nature. An autograder is used to grade most assignments. Students should be comfortable using R prior to beginning this course. The RStudio offers reproducible documentation with Markdown syntax which will support long-term learning opportunities as well as the GitHub Copilot integration. GitHub Copilot is Microsoft’s AI tool that suggests code and can be used while coding in RStudio. Git propels students’ capacity for collaboration and organization as well as version control of documentation and files.


Graduate Credit

This course is open to undergraduate and graduate students. Graduate students will be expected to complete additional work in the course to justify the 4 credits. For graduate students to earn 4 credits, they must complete the following additional work and submit it into their GitHub repository by Friday of Week 17 (December 19, 2025):

  • All homeworks must be completed in R code and saved as .R file and in Python code and saved as .ipynb file.

  • All training labs must be completed in R code and saved as .R file and in Python code and saved as .ipynb file.

  • All test labs must be completed in R code and saved as .R file and in Python code and saved as .ipynb file.


Learning Objectives

These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.

  • Students will assess effectiveness, organization, and intent from a published data set

  • Students will explore data sets of various types

  • Students must design well-organized, clean data sets for the purpose of data analysis

  • Students will present data management work in a reproducible document file using Markdown syntax and R code chunks. No local data files will be utilized.

  • Students must demonstrate critical thinking and creativity through asking questions about a given data set

  • Students must be able to explain and summarize data wrangling code

  • Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue

  • Students must recall important data management concepts

  • Students will reflect on their own learning of data management principles

  • Students will build data wrangling tools, apps, and dashboards and store all work using git, a version control software.

  • Students will collaborate on lab assignments.

  • Students will reproduce and replicate data visualizations.


Instructor

  • Instructor - Christopher Kinson (kinson2@illinois.edu)

profile picture of Christopher Kinson


Course Specifics

Course Website

The course website is https://github.com/illinois-stat440. This course is operating as an organization named illinois-stat440 within GitHub. Students should bookmark or save the link above in their browser for future use, because it contains access points to all repositories, course materials including notes, assignments, and archived videos.


Prerequisites

The prerequisites for this course are the following:

  • A laptop (not a netbook) with most up-to-date versions of R and RStudio installed. If using a netbook or Chromebook, please setup an RStudio Cloud account.

  • STAT 400 or STAT 409

  • Operating knowledge of computers such as locating a file, creating a directory, saving a file, compressing a file, extracting a compressed file, keyboarding, and fundamental troubleshooting

  • Operating knowledge of R such as understanding various object types, mathematical and logical operators, and value types and their coercion, as well as creating user-defined functions and fundamental R troubleshooting


Meeting Schedule and Expectations

  • For section 1UG/GR, the class meets at 12:00 pm - 12:50 pm in Room 2101 of the Everitt Laboratory on Mondays, Wednesdays, and Fridays.

  • For section 2UG/GR, the class meets at 01:00 pm - 01:50 pm in Room 2233 of the Everitt Laboratory on Mondays, Wednesdays, and Fridays.

  • There will be archived lecture videos with links posted on the Course Website (see in videos directory of the course_content repository). See the Instructional Activities section below for more details.

  • Typically, Mondays will initial assessment days including training labs. Wednesdays will be lecture and review days. Fridays will be assessment days including test labs and paper exams.

  • All students are expected to fully participate in class regularly.

  • All students are expected to do the following before coming to class each week: read the course notes, practice coding the examples (code chunks) within the course notes, watch archived lecture videos, create relevant exam questions using GitHub Copilot, and answer those AI-generated questions to build knowledge and skill.

  • Course content - syllabus, notes, videos, and weekly schedules, discussion board (as Discussions tab) - will be found on the Course Website via the course_content repository. Do check the course_content repo often for updates and announcements. Students are encouraged to clone and pull the course_content repository daily if accessing it remotely via git.


Office Hours

Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.

Office hours will be remote on Zoom. If a student has a specific question, but cannot attend the office hours, then that student should post their question in the discussion board (see Discussions tab of the course_content repo). If a student wants one-on-one assistance from the Insturctor at an alternative time, then that student should email the Instructor in order to schedule a Zoom meeting.


Textbooks

There is no required textbook, but students may find the texts below to be helpful. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library as E-books.

R

Markdown

RMarkdown

Git and GitHub


Software

The course requires students to already have a fundamental and operational understanding of R. It is recommended that students with no familiarity in R understand that this course will not discuss fundamental and operational usage of R.


Calendar

Below is a calendar of topics and tentative assignment deadlines. Any deadlines apply to all sections of STAT 440 unless otherwise noted.

Week 2025 Dates Topics Assignments (Due Date)
01 08/25 - 08/31 Introduce the course and software, Git and GitHub tips, GitHub Copilot quick tutorial, Autograder pitfall examples training-lab01 (Monday 08/27), test-lab01 (Friday 08/29)
02 09/01 - 09/07 What is data, Structures of data, delimiters, and file extensions training-lab02 (Monday 09/03), test-lab02 (Friday 09/03) homework01 (Friday 09/05)
03 09/08 - 09/14 Accessing and importing data (including APIs), Exporting data, Handling dates and times training-lab03 (Monday 09/10), test-lab03 (Friday 09/12)
04 09/15 - 09/21 Shiny apps and dashboards training-lab04 (Monday 09/17), test-lab04 (Friday 09/19), homework02 (Friday 09/19)
05 09/22 - 09/28 Data visualization I training-lab05 (Monday 09/24), paper-exam01 (Friday 09/26)
06 09/29 - 10/05 Data visualization II training-lab06 (Monday 10/01), test-lab05 (Friday 10/03), homework03 (Friday 10/03)
07 10/06 - 10/12 Data expansion training-lab07 (Monday 10/08), test-lab06 (Friday 10/10),
08 10/13 - 10/19 Data reduction, Arranging data, Reshaping data training-lab08 (Monday 10/15), test-lab07 (Friday 10/17), homework04 (Friday 10/17)
09 10/20 - 10/26 Regular expression, String manipulation training-lab09 (Monday 10/22), paper-exam02 (Friday 10/24)
10 10/27 - 11/02 Summarizing data training-lab10 (Monday 10/29), test-lab08 (Friday 10/31), homework05 (Friday 10/31)
11 11/03 - 11/09 Combining data training-lab11 (Monday 11/05), test-lab09 (Friday 11/07),
12 11/10 - 11/16 SQL databases and queries training-lab12 (Monday 11/12), test-lab10 (Friday 11/14), homework06 (Friday 11/14)
13 11/17 - 11/23 SQL queries and sub-queries training-lab13 (Monday 11/19), paper-exam03 (Friday 11/21)
14 11/24 - 11/30 FALL BREAK  
15 12/01 - 12/07 Creative thinking, critical thinking, and futurism in data science training-lab14 (Monday 12/03), reflective-survey (Friday 12/05)
16 12/08 - 12/14 Data science careers, graduate school, and the future for a data worker, Final project discussion final-project (Friday 12/12)

Grading Breakdown

  • 1 Reflective Survey: 2 points total

  • 1 Final Project: 22 points total (12 points for video presentation and 10 points for app)

  • 3 Paper Exams: 36 points total (12 points each)

  • 6 Homeworks: 24 points total (4 point each)

  • 10 Test Labs: 40 points total (4 points each)

  • 14 Training Labs: 28 points total (2 points each)

Course Total Points: 152 points


Final Letter Grades

When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There is only one $+$ letter grade in this course. All other letter grades are without $+/-$. Points are not rounded.

Lower bound Upper bound Letter grade
152.000 points 152.000 A+
136.800 points 151.999 points A
121.600 points 136.799 points B
106.400 points 121.599 points C
91.200 points 106.399 points D
0.000 points 91.199 points F

Instructional Activities

Students should read the course notes and practice the code therein, annotate the lectures, watch the archived lecture videos as supplemental materials, and attempt the assignments. If or when students get stuck, then they should ask questions in the i) Discussions Board, ii) Office Hours, or iii) via email (preference in this order). The following activities and tools will be useful for students.

Course Notes

The course notes are reproducible documents that contain text, images, and code chunks. The notes are written in RMarkdown syntax and saved as .Rmd files. The notes may be rendered as .html files for easy reading and navigation. The notes are located in the course_content repository in the notes directory. Students should read the notes before coming to class each week. Students should also practice coding the examples (code chunks) within the course notes.

The course notes perform the duty of a textbook for this course. Yes, there is a lot of information in the notes, but it is useful to read it for the important parts and return to it for details after attempting the assignments.

In-person Lecture

In-person lecture will be used to review and clarify concepts from the course notes and archived lecture videos. In-person lecture will also be used to introduce new concepts that are not covered in the course notes or archived lecture videos. In-person lecture will be used to answer questions from students about the course notes, archived lecture videos, and assignments. In-person lecture will also be used to provide guidance on how to approach and complete assignments.

Archived Lecture Videos

The archived lecture videos are supplemental materials that cover the same content as the course notes. The videos are located in the course_content repository in the videos directory. Students should watch the archived lecture videos before coming to class each week. Students should also practice coding the examples (code chunks) within the course notes while watching the videos.

Discussions

This discussion board, which exists as a tab on the course_content repository, is one of the best ways to communicate with classmates and the Instructor. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.

Do use the board to openly discuss ideas about the course such as questions about content, deadlines, notes, data, etc. If a student specifically wants the Instructor to respond, then student should use the mention @staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to kinson2@illinois.edu. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.

The Instructor will view and respond to content on Mondays, Wednesdays, and Fridays (at a minimum). The Instructor can be expected to spend at least 30 minutes on those days monitoring the Discussions.


Assignments

Reflective Survey

The reflective survey is a short digital survey that students complete near the end of the semester. The survey will ask students to reflect on their learning in the course, what they liked, what they did not like, specific aspects of technology usage, and how the course might be improved. The survey is worth 2 points and is due on Friday of Week 15 (December 5, 2025).

Final Project

The final project is a shiny app that students create in R and demonstrate in a video presentation. The project is worth 22 points total (12 points for the video presentation and 10 points for the app). The project is due on Friday of Week 16 (December 12, 2025). Students must submit the .R script file in their repo and include a URL/link to the video presentation within the .R script file as a code comment using the # symbol. The video must be between 5 and 10 minutes long and show the student’s face in full view while explaining what purpose the app serves if it were to be used by people later in the data pipeline such as analysts and data scientists, how the app works, why they chose to build it, and how they personally connect with the app.

Paper Exam

The paper exam is an in-person exam that will be given on some Fridays. The exam will be closed notes, closed book. Each paper exam must be submitted on the Course Website via a student’s repo.

Homework

These are assignments that students complete individually and submit in the main branch of their GitHub repository inside of the homework directory. These 6 homeworks are graded for completion, not correctness. However, the code must follow reproducibility guidelines and must not contain executable errors or warnings.

Test Lab

These are in-person lab sessions that are due at the end of the class period on some Fridays. Test labs are intended to challenge students to apply covered concepts for the week cumulatively.

Training Lab

These are in-person lab sessions that are due at the end of the class period on Mondays. Training labs are intended to give students an opportunity to fail productively such that knowledge of content and skills may be corrected and reinforced.


Autograder

The code we write in this course must be reproducible - verifiable by any computer running the same exact code and receiving the same exact result as the original source. It is important that code does not contain executable errors and warnings. Submitting code with executable errors and warnings shows that a learner is not following one of the course learning objectives. Submitting error-producing code also shows that there is no regard for what reproducibility means. There is an autograder used in this course to grade assignments. The autograder is not forgiving. It scans the entire file and check for base R executable errors and warnings as well as grade the assignment for correctness and completion. Objects created at the top of the file which are overwritten at the bottom of the file are considered incorrect by the autograder. When the autograder detects a base R executable error or warning, it stops grading the learner’s submission and assign a grade of 0 for the assignment.

To follow reproducible coding guidelines and prevent executable errors and warnings, be sure to do the following (in no particular order):

  • Always use URLs for accessing and importing data. Local file locations are not reproducible.

  • If timing permits, knit the file to html to see if any error occurs.

  • If timing permits, run your code in R (not in RStudio). Check the R console to see if any error occurs.

  • Save the file with the correct name. Your netid should replace anything saying ‘netid’.

  • Save the file in the correct location. homeworkXX assignments belong directly in the homework directory of your repo on the main branch. Any sub-directories within this directory is inappropriate.

  • Within a code chunk, explicitly write code that attaches or loads a package using either library() or environment call package_name:: if you use a package to produce your result.

  • Change your RStudio Global Options’s General Tab such that:

    • Restore most recently opened project at startup is not checked.
    • Restore previously open source documents at startup is not checked.
    • Restore .RData into workspace at startup is not checked.
    • Save workspace to .RData on exit is Never.
    • Always save history (even when not saving .RData) is not checked.
  • Change your RStudio Global Options’s Code Tab such that Under the Saving section:

    • Always save R scripts before sourcing is not checked.
    • Automatically save when editor loses focus is not checked.
    • When editor is idle is Do nothing.
  • Within RStudio, restart your R session. This can be done in RStudio using the Session > Restart R. After clicking this, if your session still shows objects in the Environment, then click Session > Terminate R > Yes. Terminating the R session effectively does the same thing that restarting the R session should do: detach any packages and remove all objects in the global environment giving you a new session.

  • After beginning a new session, execute and run all your code to ensure there are no executable errors or warnings. Some warnings are specific to a package which may not cause R executable errors or warnings.

  • Comment out any erratic code using the hashtag symbol #. Doing so prevents the autograder from executing it. This is useful if you don’t know how to correct your errors or warnings before the deadline.

  • Comment out or remove any install.packages() in your code chunks.


Grade Disputes

A grade dispute is not a plea or request to change a grade simply because a learner does not like their original grade.

A grade dispute is when a grade has been incorrectly applied to an assignment and the learner has evidence supporting the fact that the grade is incorrectly applied.

Please email the Instructor (kinson2@illinois.edu) with your disputes within 7 days (i.e. 1 week) of your grade being returned.


Late, Improper, or Irreproducible Submissions Policy

This policy applies only to the following at-home assignments: reflective survey, final project, and homework.

An at-home assignment is considered a late submission when it is submitted by a learner in the proper location after the assignment deadline.

An at-home assignment is considered an improper submission when it is submitted by a learner outside of the appropriate directory in their repo or not in their repo at all.

An at-home assignment is considered an irreproducible submission when it is submitted by a learner and the code within the file produces an executable error. Thus, there is no way to reproduce the same coding result as the original submission presumes.

It is possible for an at-home assignment to be submitted given any combination of the following troubles: late, improper, or irreproducible.

Learners have up to 2 days to properly submit an at-home assignment that was originally considered any of the following: late, improper, or irreproducible.

The latest gradable at-home assignment submission is on Sundays by 11:59 pm. Any time after this day, the assignment submission is deemed missing and a grade of 0 is earned for any such at-home assignment.


University Specifics

Disability Accommodations

To obtain disability-related academic adjustments and/or auxiliary aids, learners with disabilities must contact the course Instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, learner may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail disability@illinois.edu or go to the DRES website.

Academic Integrity and Generative Artificial Intelligence Tools

It is expected that all learners abide by the campus regulations on academic integrity. Intentional violations of academic integrity include, but are not limited to, copying any part of another learner’s assignment and allowing another learner to copy any part of learner’s own assignment.

Generative artificial intelligence tools can be useful in learning and studying. If learners use generative AI tools in this course, we suggest doing so outside of class as a means of studying and learning accurate information relevant to this course’s content. Learners are permitted to use generative artificial intelligence tools on graded assignments in this course. Beware that multiple learners with the same exact code solution may be in violation of academic integrity.

It is important to understand the course content and code for yourself and adapt code to be in alignment with the course content and trajectory. Using complex coding, because it is suggested by generative AI, demonstrates a lack of understanding of the actual course material and calls into question one’s own ability to be curious, critical, and skeptical. Furthermore, reliance on generative AI tools may lead to dependence on its use and a lack of individuality.

This course is concerned with the way learners think and create and their ability to adapt that creativity in various conceptual settings and environments. This course aims to challenge all learners to retain and exercise their own individual knowledge and power.

Safety Protocol

We have been asked by Public Safety to share the following information in case of weather or security emergencies. See the links:

Sexual Misconduct Policy and Reporting

The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office provides information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.

This is a list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality. Read more about other information about resources and reporting.


The Last Word

The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.