PSYC 201B
  • Schedule
  • Final Project
  • Open Datasets
  • Glossary
  • Guides & Resources
    • Git & Github
    • Github Classroom
    • Python Resources
    • Mathematical Notation Reference
    • Common Formulas Reference
  • Anonymous Feedback
  • Previous course versions
  1. Final Project

Site Updates: Final Project Details, Slides Week 6

  • Week 1
    • Overview
    • “Pre-flight” Lab Setup
    • Python Fundamentals
    • Polars (dataframes) Crash Course
  • Week 2
    • Overview
    • Seaborn (plotting) Crash Course
    • EDA Workflows
  • Week 3
    • Overview
  • Week 4
    • Overview
  • Week 5
    • Overview
  • Week 6
    • Overview

On this page

  • 1. Pick Solo or Group
  • 2. Pick a Dataset
  • 3. Work & Submit
  • Collaboration & GenAI Policy
  • Rubric
    • General Formatting & Prose (10%)
    • Methods Section Clarity & Completeness (30%)
    • Results Interpretation, Assumptions, & Limitations (30%)
    • Code Quality & Reproducibility (15%)
    • Innovation & Integration (15%)

Final Project

All deadlines mean by midnight that day. So Wed Feb 11th means by 12:00am.

Week Date Milestone
6 Wed Feb 11 Initial meeting with instructors
8 Wed Feb 25 Project proposal due
Finals Tues Mar 17 Final project due

The final project is your opportunity to apply the statistical modeling skills you’ve learned throughout the course to a real dataset. Your deliverable is a publication-quality Abstract, Methods, & Results document — the kind you’d submit to a journal as part of a full manuscript — rendered as a reproducible .qmd document with all supporting materials (e.g. code, data) in a github repository. Your instructors will grade you based on the rubric which isn’t designed to trick you, but help you focus on these core skills:

  1. Research Question
  • Formulate a clear, answerable research question that requires statistical analysis given an existing dataset
  • What inference(s) are you hoping to make? What assumptions might those require? How will you validate this?
  • We do not want you to collect new data or design a new experiment. Instead you should focus on acquiring, cleaning, and organizing an existing dataset to make it amenable to your research question(s).
  1. Data Analysis
  • Exploratory Data Analysis — Summary statistics, visualizations, data quality assessment
  • Statistical Modeling — You hypotheses articulated as statistical model(s) with diagnostic evaluation where appropriate
  • Inference/Prediction — Approaches relevant to your question: parameter inference, cross-validated prediction performance, etc
  • Interpretation — What do the results mean? How do they relate to your assumptions? What future directions?

1. Pick Solo or Group

You may work solo or in a group. In either case you’ll use a project template that we’ll provide in a few weeks, configured like our labs and HWs (Python environment, Quarto, Notebooks). Groups may provide a single submission, but you must:

  • Include a CRediT author statement describing each member’s contributions
  • Demonstrate evidence of collaboration via GitHub — each person must contribute at least 1 commit/PR to the project repository

2. Pick a Dataset

Important

In all cases, you must choose a new question to answer — not something you or someone else has previously analyzed. In other words, no reproducibility/replication focused projects. Think about extending prior work instead.

Regardless of solo or group, you must choose one of the following options for your dataset:

  1. Existing dataset from your 201A or first-year project
  2. Existing dataset in your lab
  3. New open dataset
  4. New simulated dataset

For option 3 we’ve put together a large list of open datasets that you can browse based on your research interests. By no means should you feel limited to using a dataset on this list! These are just to help you get started.

For option 4 you should discuss with your instructors what you’re thinking and why you want to pursue this option.

3. Work & Submit

Both your project proposal and your final submission will be in the form of a GitHub repository just like our labs and HWs. We’ll provide a template soon, that you’ll use to submit your initial project proposal. Then you’ll update the same repository to commit and push your work. When you’re done your final commit should make sure to include:

  1. Quarto document
  • A .qmd file and rendered PDF that includes:
  • title, authors, and relevant meta-data
  • Abstract
  • Methods
  • Results with relevant figures & tables
  • References
  1. Data files and analysis scripts
  • You should update your project README.md with informative descriptions of all relevant analysis & data files. Feel free to perform your core work using Marimo notebooks if you find those easier. Just make sure they are documented and your final submission is still a Quarto .qmd file
  • You should make sure to separate raw data files from any preprocessed/cleaned data by making new files
  • If datasets are too large for GitHub (>5 MB), provide a download link and retrieval instructions in the README so someone can still reproduce your work
  1. Reproducible Python environment
  • While we’ll configure the template with all the Python packages we’ve been using throughout the course, feel free to use additional tools if you want. Just make sure to use uv add so they’re tracked automatically in your pyproject.toml file and a collaborator/reviewer gets them when they git clone and run uv sync.

Collaboration & GenAI Policy

Whether working solo or in a group, you are welcome to discuss ideas, troubleshoot code, and give feedback with classmates outside your group — but all submitted writing and analysis must be your own (or your group’s). If you use GenAI tools, make sure to review the course GenAI policy (i.e. use it as a tool not a cheat-code) and include a transcript with your submission.

Rubric

General Formatting & Prose (10%)

Excellent (9-10) Good (7-8) Adequate (5-6) Needs Improvement (0-4)
Publication-quality writing: clear, concise, and error-free. APA-style reporting of statistics (e.g., F(1, 48) = 12.3, p < .001, d = 0.45). Figures and tables are properly labeled, referenced in text, and enhance the narrative. Complete, properly formatted references Generally clear writing with minor errors. Statistics mostly reported in APA style with occasional inconsistencies. Figures/tables present with minor labeling issues. References present but may have formatting gaps Understandable but unpolished writing. Inconsistent statistical reporting format. Figures/tables incomplete or poorly integrated. Several grammatical errors or missing references Unclear or confusing writing. Statistics poorly reported or missing key values. Missing or unprofessional figures/tables. Pervasive grammatical issues

Methods Section Clarity & Completeness (30%)

Excellent (27-30) Good (21-26) Adequate (15-20) Needs Improvement (0-14)
Data: Complete description of the dataset — source, how it was collected, sample characteristics, sample size, and any exclusions or missing data handling explained. Variables: All variables clearly operationalized (what was measured, how, units/scales). Analytical Approach: Statistical models justified with clear rationale; specifies assumption checks, alpha levels, and software/packages used. Sufficient detail for an independent researcher to reproduce the analysis All major components present with some details missing. Variables adequately described. Analytical choices mostly justified. Minor gaps that wouldn’t prevent replication Key information present but lacks depth. Some variables poorly described. Limited justification for analytical choices. Replication would be challenging Major components missing or inadequate. Insufficient detail for understanding the analysis. No justification for analytical decisions. Replication not possible

Results Interpretation, Assumptions, & Limitations (30%)

Excellent (27-30) Good (21-26) Adequate (15-20) Needs Improvement (0-14)
Assumptions: Explicitly tests all relevant assumptions (normality, homogeneity, independence, linearity); appropriate diagnostic plots; states whether assumptions are met and describes remedial actions if not. Results: All analyses clearly reported with complete statistics (test statistics, df, p-values, effect sizes, CIs); figures and tables enhance understanding. Interpretation: Accurate; avoids over-interpreting null results; appropriately cautious with causal language. Limitations: Thoughtful discussion of methodological and statistical limitations, including threats to validity Most assumptions checked appropriately. Results clearly reported with minor omissions. Interpretation generally sound with appropriate caution. Limitations discussed but may lack depth or miss key issues Basic assumption checking but incomplete. Results reported but missing key statistics (e.g., no effect sizes or CIs). Some understanding shown but may over-interpret. Limitations mentioned but superficial Assumptions not checked or inadequately addressed. Results poorly reported or incomplete. Misinterpretation of findings. Limitations absent or lacking understanding

Code Quality & Reproducibility (15%)

Excellent (14-15) Good (11-13) Adequate (8-10) Needs Improvement (0-7)
Code runs without errors and reproduces all results. Well-organized with clear headers and comments. Follows best practices (relative paths, no hardcoding). Includes package versions and random seed where applicable. Data cleaning transparent and justified. Efficient and readable code Code runs with minimal adjustments. Generally well-organized with adequate comments. Reproduces main results. Minor issues with paths or organization Code runs but requires troubleshooting. Limited organization or comments. Reproduces some but not all results. Hardcoded paths or unclear workflow Code doesn’t run or has major errors. Poorly organized and difficult to follow. Cannot reproduce reported results. Missing key analysis steps

Innovation & Integration (15%)

Excellent (14-15) Good (11-13) Adequate (8-10) Needs Improvement (0-7)
Goes beyond basic analyses with appropriate techniques (e.g., model comparison via AIC/BIC or cross-validation, bootstrap inference, mixed-effects modeling for nested data). Analytical choices justified by citing course materials or methodological literature. Demonstrates thoughtful engagement with the “why” behind modeling decisions Incorporates some techniques beyond the basics. Analytical choices generally justified. Shows familiarity with course best practices Primarily basic analyses without much justification. Limited connection to course concepts or methodological reasoning Only rudimentary analyses. No justification for modeling choices. Does not engage with course material