STAT 440 Spring 2024 Unofficial SyllabusWritten on January 2nd , 2024 by Christopher Kinson
STAT 440 Statistical Data Management
3/4 Credit Hours - Major Elective
Spring 2024 - Unofficial Syllabus
Table of Contents
- Course Description
- Learning Objectives
- Course Staff
- Course Specifics
- University Specifics
- Last Word
Statistical Data Management (STAT 440) is a focused data wrangling course that aims to cover various types of data storage, manipulation, cleaning, and extractions and to apply these methodologies in R. This means that students must have a laptop that they bring to class each day. This course is not a traditional lecture format. Most lecture content will be distributed in video form at the beginning of the semester. Notes are reproducible documents that include examples and code. Videos and notes should be read and understood before (and outside of) class. The classroom space and time will be for application, practice, and assessment of course content in those videos and notes. The expectation is that students will gain competency in exploring, organizing, designing, creating, storing, cleaning, wrangling, sharing, and using data, all of which are commonly done prior to data analysis. Critical and creative thinking and efficient coding will be encouraged. Concepts covered in this course will build upon each other. Thus, students can expect all assessments to be cumulative. An autograder is used to grade assignments. Students should be sufficient and comfortable in R prior to beginning this course. The RStudio offers reproducible documentation with Markdown syntax which will support long-term learning opportunities. Git propels students’ capacity for collaboration as well as version control of documentation and files.
These learning objectives are important because they connect the physical know-how with the technical knowledge of the course.
Students will assess effectiveness, organization, and intent from a published data set
Students will explore data sets of various types
Students must design well-organized, clean data sets for the purpose of data analysis
Students will present data management work in a reproducible document file using Markdown syntax and R code chunks. No local data files will be utilized.
Students must demonstrate critical thinking and creativity through asking questions about a given data set
Students must be able to explain and summarize data wrangling code
Students will share and discuss data management ideas, coding snippets, and other thoughts to aid in meaningful dialogue
Students must recall important data management concepts
Students will reflect on their own learning of data management principles
Students will build data wrangling tools, apps, and dashboards and store all work using git, a version control software.
Students will collaborate on lab assignments.
Students will reproduce and replicate data visualizations.
- Instructor - Christopher Kinson (email@example.com)
The course website is https://github.com/illinois-stat440. This course is operating as an organization named illinois-stat440 within GitHub. Students should bookmark or save the link below in their browser for future use, because it contains access points to all repositories, course materials including notes, assignments, projects, and lecture videos.
The prerequisites for this course are the following:
A laptop (not a netbook) with most up-to-date versions of R and RStudio installed. If using a netbook or Chromebook, please setup an RStudio Cloud account.
STAT 400 or STAT 409
Operating knowledge of computers such as locating a file, creating a directory, saving a file, compressing a file, extracting a compressed file, keyboarding, and fundamental troubleshooting
Operating knowledge of R such as understanding various objects, mathematical and logical operators, and value types and their coercion, as well as creating user-defined functions and fundamental R troubleshooting
For section 1UG/GR, the class meets at 04:00 Pm - 04:50 Pm in Room 32 of the Psychology building on Mondays, Wednesdays, and Fridays.
There will be asynchronous lecture videos with links posted on the Course Website (in videos directory of the course_content repository). See the Instructional Activities section below for more details.
All students are expected to fully participate in class regularly.
All students are expected to do the following before coming to class each week: read the course notes, watch the lecture videos, and complete assignments.
Course content - syllabus, notes, videos, exams, projects, and weekly schedules, discussion board (as Discussions tab) - will be found on the Course Website via the course_content repository. Do check the course_content repo often for updates and announcements. Students are encouraged to clone and pull the course_content repository daily if accessing it remotely via git.
Any and all times listed in this document are in current US Central Time. Take care to adjust clocks when daylight savings time occurs.
Office hours will be in-person. If a student has a specific question, but cannot attend the office hours, then that student should post their question in the Discussions board. If a student wants one-on-one assistance from the course staff at an alternative time, then that student should email the course staff in order to schedule a Zoom meeting.
Instructor in-person office hours:
- Thursdays 05:15 pm - 07:15 pm in Room 166 Computer Applications Building (CAB)
There is no required textbook, but students may find the texts below to be helpful. These are all free and accessible to students for further reading. The Instructor may refer to certain sections of these texts in the course content. The asterisk * means these are accessible from the University Library as E-books.
- *Data Wrangling with R. Boehmke. Springer Cham. http://www.library.illinois.edu/proxy/go.php?url=http://dx.doi.org/10.1007/978-3-319-45599-0
- *R for Data Science. Wickham and Grolemund. O’Reilly Media, Inc. https://learning.oreilly.com/library/view/-/9781491910382/?ar
- Mastering Shiny. Wickham. https://mastering-shiny.org/
- The Markdown Guide. Cone. https://www.markdownguide.org/getting-started
- RMarkdown Cheat Sheet. RStudio. https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
Git and GitHub
- Happy Git and GitHub for the useR. Bryan et al. https://happygitwithr.com/
The course requires students to already have a fundamental and operational understanding of R. It is recommended that students with no familiarity in R understand that this course will not discuss fundamental and operational usage of R.
- R with RStudio
- Git and GitHub
- Step 1. Download and Install Git
- Step 2. Create a GitHub account if you don’t have one already and Sign into GitHub
- Step 3. Click here to enroll in STAT 440 course organization and follow the steps to be setup properly in the learning management system
- Make sure your Git is updated to the latest version.
- Zoom video teleconferencing software with functioning Webcam and Microphone https://illinois.zoom.us/
Below is a calendar of topics and tentative assignment deadlines. Any deadlines apply to all sections of STAT 440 unless otherwise noted.
|01/15 - 01/19
|Introduce the course and software - RStudio, RMarkdown, Git, and GitHub, Git and GitHub tips
|01/22 - 01/26
|More Git and GitHub tips, Loops and conditional execution, Apply family of functions, Vectorization
|01/29 - 02/02
|What is data, Structures of data, delimiters, and file extensions, Accessing and importing data, Exporting data, Handling dates and times
|02/05 - 02/09
|Assigning objects, Arranging data, Reshaping data, Data expansion, Data reduction
|02/12 - 02/16
|Regular expression and string manipulation, Summarizing data, Combining data
|02/19 - 02/23
|Validating data, Cleaning data
|02/26 - 03/01
|SQL queries and sub-queries
|03/04 - 03/08
|Midterm Exam 01 on Wednesday March 06, Reassessment Midterm Exam 01 on Friday March 08, Focus on Midterm Exam
|03/11 - 03/15
|03/18 - 03/22
|More SQL, Data visualization using base R
|03/25 - 03/29
|More SQL, Data visualization using tidyverse
|04/01 - 04/05
|Accessing and importing data via web scraping, Accessing and importing data via an API
|04/08 - 04/12
|Shiny apps and dashboards
|04/15 - 04/19
|Midterm Exam 02 on Wednesday April 17, Reassessment Midterm Exam 02 on Friday April 19, Focus on Midterm Exam
|04/22 - 04/26
|Final Project Pre-Feedback Submission due at 11:59 pm on Friday April 26, Final Project Help and Questions and Answers
|04/29 - 05/03
|Reading Day - May 02 (no more class or office hours), Data workers and responsibilities, Discussion about careers, graduate school, and the future post-STAT440, Focus on Final Project and Final Exam
|05/06 - 05/10
|Final Exam for section 1 UG/GR at 1:30 pm - 04:30 pm on Wednesday May 08, Final Project Post-Feedback Submission due at 11:59 pm on Friday May 10
14 Piloted Practices: 14 points total (1 point each)
- pp01-pp16 are weekly comprehensive assignments due on Mondays (before class starts). No student receives credit for pp01 and pp02. Credit is earned for pp03-pp16.
2 Lab Assignments: 8 points total (4 points each)
- Lab01 series begins Week 04 (February 07) and ends Week 07 (March 01). Each student will be an in-person in-class driver only once in this series. Labs take place on Wednesdays and Fridays in this series.
- Lab02 series begins Week 10 (March 18) and ends Week 13 (April 12). Each student will be an in-person in-class driver only once in this series. Labs take place on Wednesdays and Fridays in this series.
2 Midterm Exams: 20 points total (10 points each)
- Midterm Exam 01 is a 50-minute (student’s section’s class start and end time) in-person exam (March 06). It contains conceptual and applied problems.
- Midterm Exam 02 is a 50-minute (student’s section’s class start and end time) in-person exam (April 17). It contains conceptual and applied problems.
1 Final Project Pre-Feedback Submission: 13 points total
- Deadline for .R file pre-feedback submission into project repo is 11:59 pm Friday April 26. Students must work in pairs for final project. Grading rubric provided in course_content repo (and below in Assignments section).
1 Final Project Post-Feedback Submission: 4 points total
- Deadline for R file post-feedback submission into project repo is 11:59 pm Friday May 10. Grading rubric provided in course_content repo (and below in Assignments section).
1 Final Exam: 21 points total
- Final Exam is a 3-hour in-person exam containing conceptual and applied problems. This exam schedule is in alignment with the University Final Exam Schedule (see https://registrar.illinois.edu/courses-grades/final-exam-schedule-public/ for more information). Adjust your schedules and travel plans accordingly. Any undergraduate requests for a conflict exam will be denied.
Course Total Points: 80 points
When computing final grades, students can add up their scores on the assignments. The resulting sum will determine which letter grade they earn when the course is completed. There is only one $+$ letter grade in this course. All other letter grades are without $+/-$. Points are not rounded.
Students should read the course notes, watch the lecture videos, and attempt the assignments. If or when students get stuck, then they should ask questions in the i) Discussions Board, ii) Office Hours, or iii) via email (preference in this order). In addition to lecture videos and office hours, the following activities and tools will be useful for students.
The course notes perform the duty of a textbook for this course. Yes, there is a lot of information in the notes, but it is useful to read it for the important parts and return to it for details after attempting the assignments.
This discussion board, which exists as a tab on the course_content repository, is one of the best ways to communicate with classmates and course staff. Questions can be seen quickly and receive a rapid response. Students are encouraged to use this board, but there is no requirement to participate in the discussion board.
Do use the board to openly discuss ideas about the course such as questions about content, deadlines, notes, data, etc. If a student specifically wants the course staff to respond, then student should use the mention
@staff when posting in the board. The things discussed here should be of a non-personal and non-private matter. If student has a personal or private matter to discuss with the Instructor, please send an email to firstname.lastname@example.org. Additionally, the conversation in the discussion board should be respectful of people’s differences and cannot be used to speak negatively about anyone or harm anyone.
The course staff will view and respond to content on Tuesdays and Thursdays (at a minimum). The course staff can be expected to spend at least 30 minutes on Tuesdays and Thursdays monitoring the Discussions.
Please email the Instructor with your requests and disputes within 7 days (i.e. one week) of your grade being returned for a specific assignment.
These are guided comprehension assignments to be completed by each student as an individual before the class begins on Mondays. There are 14 piloted practices for the semester. The shortname for these assignments is pp. They are numbered by their corresponding week number. For example, pp04 corresponds to week 04 piloted practice and is due on Monday of Week 04.
These assignments are to be submitted in main branch of the student’s repository inside of the piloted-practices directory, which exists within the illinois-stat440 organization in GitHub. See Course Website for web links. These pp’s are graded for completion, not correctness. However, the code must follow reproducibility guidelines and must not contain executable errors or warnings.
These are lab sessions that contain 4 problems and are due at the end of the class period on Mondays and Wednesdays by the “driver.” The course is located in an iFLEX classroom, which allows for ease of communication, collaboration, and displaying content on large screens at stations. Each week (beginning in Week 04), randomly chosen students (“drivers”) will complete the lab assignment seated at specified stations, while the remaining students (“passengers”) at the stations will help the drivers by giving them ideas and advice on how to complete the problems. Drivers must submit their labs. Passengers are not allowed to type on driver’s laptops. All students must bring their laptops to class each day. Lab assignments are intended to push students to apply concepts covered in the course notes and lecture videos and encourage students to work together as a team in a limited amount of time. Each student must complete two lab assignments as the driver by the respective lab due dates. A schedule will be given to students displaying exactly when each student is to be driver and which station number the driver should sit. Students should not deviate from this schedule. Late submissions of lab assignments will not be accepted. There will be no make-ups for any missed labs. This policy applies to any students who add the course late to their registration.
Each midterm exam contains 10 problems and is in-class and in-person. The exam will be closed notes, closed book. Each midterm file must be submitted on the Course Website via a student’s repo. The midterm exam will be graded for correctness and completeness. Students can expect the midterm exam to include a structure similar to lab assignments with a possible mixture of completion, correct/incorrect, and open-ended questions. Late submissions on any Midterm Exam submission will not be accepted or graded. There will be no make-ups for any missed Midterm Exam submission. There is one Reassessment Midterm Exam for each original Midterm Exam to give students who struggle with the original Midterm Exam submission a second opportunity to review course content and improve their score: Reassessment Midterm Exam 1 and Reassessment Midterm Exam 2. The Reassessment Midterm Exams take place on the Friday following the original Midterm Exam. The Reassessment Midterm Exam grade may be used to replace the original Midterm Exam grade, for the respective exam number. For example, a student’s Reassessment Midterm Exam 1 grade of 10 points may only be used to replace the grade of the original Midterm Exam 1 grade of 0 points. All exams (original or reassessment) are in-person exams. For students who do not wish to take the reassessment, they may use the reassessment exam date as a study at home day.
The final exam is one in-person exam with 21 problems. The exam will be open notes, open resources, and take up to 3 hours to complete. The final exam will be graded for both correctness and completeness. Students can expect the final exam to include a structure similar to labs and the midterm exams with a possible mixture of completion, correct/incorrect, and open-ended questions. Students should not expect solutions to be provided for the final exam. Late submissions on any Final Exam will not be accepted or graded. There will be no make-ups for any missed Final Exam.
The Final Project in this course is the creation of a single Shiny app or dashboard in R. This Shiny app or dashboard is an opportunity for students to demonstrate the statistical data management concepts covered this semester and apply them along with version control for collaboration. The final project requires students to work in groups of 2 members only. Individual final projects will be considered only if there is an odd number of enrolled students and/or there is conflict within an existing group.
The final project will have two submissions and both submissions must be a single .R file; no other file extension will be permitted. The Pre-Feedback submission is the group’s well-thought-out and successfully running Shiny app or dashboard. The deadline to submit this .R file is April 26. Students will review the feedback from the course staff and incorporate that into the second submission of the Shiny app or dashboard, called the Post-Feedback submission. The deadline to submit this new .R file is May 10.
Students should read the accompanying grading rubrics for both submissions. A new repo specific to the student’s group will be created by the Instructor in GitHub to permit collaboration. Groups must consist of students in the same section; no cross-section collaboration. Ideas and coding must be your own code. Late contributions will not be accepted. There will be no make-ups for any missed Final Project. If there is group conflict, notify the Instructor immediately so that the group will be split. Thus the two students will now work as individuals each with a new topic (different from original group’s topic) for the shiny app or dashboard.
The Shiny dashboard must contain data directly accessed from an API. The app (or dashboard) must contain an actionbutton (or submitbutton) allowing the user to click it such that clicking this button triggers the app (or dashboard) to show results. Any change the user makes, such as selecting variables and specific options, should not cause the app (or dashboard) to show new results. Instead, the app or dashboard should only show new results when the actionbutton (or submitbutton) is clicked. The results must include at least 1 data visualization and at least 1 summary table or summary tibble returned. A summary table or tibble is one that computes a new statistic based on by-group processing.
The Pre-Feedback grading rubric is (maximum total of 13 points):
|Needs Much Improvement 3
|The app (or dashboard) lacks much functionality. The buttons, selections, and options do not work properly. The app (or dashboard) completely disregards instructions.
|The app (or dashboard) contains reasonable functionality. The buttons, selections, and options do work properly. The app (or dashboard) follows the instructions.
|The app (or dashboard) incorporates much functionality and runs smoothly without hiccups. The buttons, selections, and options work precisely, quickly, and with fantastic efficiency. The app (or dashboard) completely follows the instructions and goes above and beyond to deliver impressive performance.
|The app (or dashboard), data visualization, and summary table used are least visually appealing and have minimally readability rendering them not easily interpretable. The app (or dashboard) incorporates almost no visual design principles.
|The app (or dashboard), data visualization, and table used are commonly visually appealing and have decent readability rendering them moderately interpretable. The presentation incorporates a few visual design principles.
|The app (or dashboard), data visualization, and table used are most visually appealing and have superior readability rendering them quickly and completely interpretable. The presentation incorporates several visual design principles.
|Minimum requirements met 0.6
|No minimum requirements are met. Submission ignores all requirements detailed in instructions.
|At least one but not all minimum requirements are met. Submission ignores some requirements detailed in instructions.
|All minimum requirements are met. Submission ignores none of the requirements detailed in instructions.
The Post-Feedback grading rubric is (maximum total of 4 points):
|Feedback was incorporated to improve the Functionality of dashboard
|0 (disagree), 1 (unclear), 2 (agree)
|Feedback was incorporated to improve the Beauty of dashboard
|0 (disagree), 1 (unclear), 2 (agree)
Maximum Total Final Project is 17 points.
For more information about Shiny apps and how to create one, check out the following videos and docs:
How to start a Shiny app https://vimeo.com/rstudioinc/review/131218530/212d8a5a7a/#t=0m0s
Getting started with Shiny dashboards https://rstudio.github.io/shinydashboard/get_started.html
Effective Reactive Programming Part 1 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-1-joe-cheng/
Effective Reactive Programming Part 2 https://rstudio.com/resources/shiny-dev-con/reactivity-pt-2/
Mastering Shiny (textbook). Wickham. https://mastering-shiny.org/
Interactive Graphics with Shiny https://resources.rstudio.com/webinars/interactive-graphics-winston
Understanding Shiny Modules https://resources.rstudio.com/shiny-developer-conference/shinydevcon-modules-garrettgrolemund-1080p
Debugging Techniques https://resources.rstudio.com/shiny-developer-conference/shinydevcon-debugging-jonathanmcpherson-1080p
Welcome to Shiny https://shiny.rstudio.com/tutorial/written-tutorial/lesson1/
The code we write in this course must be reproducible. It is important that code does not contain executable errors and warnings. Submitting code with executable errors and warnings shows that a student is not following one of the course learning objectives. Submitting error-producing code also shows that there is no regard for what reproducibility means. There is an autograder used in this course to grade assignments. The autograder is not forgiving. It will scan the entire file and check for base R executable errors and warnings as well as grade the assignment for correctness and completion. Objects created at the top of the file which are overwritten at the bottom of the file will be considered incorrect by the autograder. When the autograder detects a base R executable error or warning, it will stop grading the student’s submission and assign a grade of 0 for the assignment.
To follow reproducible coding guidelines and prevent executable errors and warnings, be sure to do the following (in no particular order):
Save the file with the correct name. Your netid should replace anything saying ‘netid’.
Save the file in the correct location.
pp**assignments belong directly in the piloted-practices directory of your repo on the main branch. All
lab**assignments belong directly in the labs directory of your repo on the main branch. All
exam**belongs in the exams directory of your repo on the main branch. Any sub-directories within these directories is inappropriate.
Explicitly write code that attaches or loads a package using either
library()or environment call
package_name::if you use a package to produce your result.
- Change your RStudio Global Options’s General Tab such that:
- Restore most recently opened project at startup is not checked.
- Restore previously open source documents at startup is not checked.
- Restore .RData into workspace at startup is not checked.
- Save workspace to .RData on exit is Never.
- Always save history (even when not saving .RData) is not checked.
- Change your RStudio Global Options’s Code Tab such that Under the Saving section:
- Always save R scripts before sourcing is not checked.
- Automatically save when editor loses focus is not checked.
- When editor is idle is Do nothing.
Restart your R session. This can be done in RStudio using the Session > Restart R. After clicking this, if your session still shows objects in the Environment, then click Session > Terminate R > Yes. Terminating the R session effectively does the same thing that restarting the R session should do: detach any packages and remove all objects in the global environment giving you a new session.
After beginning a new session, execute and run all your code to ensure there are no executable errors or warnings. Some warnings are specific to a package which may not cause R executable errors or warnings.
Comment out any erratic code using the hashtag symbol
#. Doing so will prevent the autograder from executing it. This is useful if you don’t know how to correct your errors or warnings before the deadline.
Comment out or remove any
install.packages()in your code chunks.
- Do not work with local files or any files that exist only on your computer. Data objects and data files must be accessed with URLs (provided by the Instructor).
To obtain disability-related academic adjustments and/or auxiliary aids, students with disabilities must contact the course instructor and the Disability Resources and Educational Services (DRES) as soon as possible. To contact DRES, student may visit 1207 S. Oak St., Champaign, call 333-4603, e-mail email@example.com or go to the DRES website at http://disability.illinois.edu/
It is expected that all students abide by the campus regulations on academic integrity http://studentcode.illinois.edu/article1_part4_1-401.html. Intentional violations of academic integrity can be found at http://studentcode.illinois.edu/article1_part4_1-402.html and include, but are not limited to, copying any part of another student’s assignment and allowing another student to copy any part of student’s own assignment.
Generative artificial intelligence tools are relatively new to the general public and can be useful in learning and studying. If students use generative AI tools in this course, do so outside of class as a means of studying and learning accurate information relevant to this course’s content. Students should not use generative artificial intelligence tools as a means to perform (on graded assignments and submissions) in this course. Doing so is an intentional violation of academic integrity.
We have been asked by Public Safety https://police.illinois.edu/emergency-preparedness/run-hide-fight/ to share the following information in case of weather or security emergencies. See the links:
The University of Illinois is committed to combating sexual misconduct. Faculty and staff members are required to report any instances of sexual misconduct to the University’s Title IX and Disability Office. In turn, an individual with the Title IX and Disability Office will provide information about rights and options, including accommodations, support services, the campus disciplinary process, and law enforcement options.
A list of the designated University employees who, as counselors, confidential advisors, and medical professionals, do not have this reporting responsibility and can maintain confidentiality, can be found at https://wecare.illinois.edu/resources/students/#confidential. Other information about resources and reporting is available at https://wecare.illinois.edu.
The Instructor reserves the right to make any changes considered to be academically advisable. Any changes will be announced in class and on the Course Website. It is the student’s responsibility to attend the class and keep track of the changes.