Data Science

How to Clean Data in Google Sheets (Step-by-Step for Data Science Teachers)

How to Clean Data in Google Sheets

Step-by-Step for YouCubed & HS Data Science Teachers permalink

Cleaning data is one of the most underrated — and most important — skills in any high school data science course. Whether your district is rolling out the YouCubed HS Data Science curriculum*, IDS*, or a state-developed pathway, students will need to learn how to:

  • identify messy or inconsistent data
  • standardize capitalization
  • remove units and stray characters
  • fix typos and inconsistencies
  • prepare data for sorting, filtering, and visualizations

This post walks through the entire process using Google Sheets, and includes:

  • a free student handout
  • a copy/paste dataset (including intentionally messy entries)
  • a downloadable CSV/XLSX
  • an optional paid teacher bundle with keys, lesson plans, screenshots, and the Common Errors reference sheet

Below is the full video lesson that accompanies this activity.



📥 Copy/Paste Dataset (Messy Version) permalink

Teachers and students can paste this directly into Google Sheets.
It includes:

  • inconsistent spacing
  • inconsistent capitalization
  • missing values
  • a blank record
  • typos (“Rigth”)
  • units included in numeric fields

Messy Dataset Table permalink

Student NamePaper TypeDistanceHand UsedThrowing style
AvaNoteBook182 cmRightoverhand
Jaydennotebook160LeftUnderhand
AmirPrinter198cmrightOverhand
MayaNote Book172RightUnder hand
EthanPrinterLeftOverhand
Sophianotebook155RightUnderhand
NoahNotebook188RightOverhand
IsabellaPrinter203 cmRightUnderhand
printer186 centimetersLOverhand
MateoNoteBook176LeftOverhand
KaiPrinter190cmRightOverhand

Copy the CSV and Paste it into Google Sheets permalink

Ava,Notebook,182 cm,Right,overhand
Jayden,notebook,160,Left,Underhand
Amir,Printer,198cm,right,Overhand

Maya,Note Book,172,Right,Under hand
Ethan,Printer,,Left,Overhand
Sophia,notebook,155,Right,Underhand
Noah,Notebook,188,Right,Overhand
Isabella,"Printer ",203 cm,Right,Underhand
" ",printer,"186 centimeters",L,Overhand
Mateo,NoteBook,176,Left,Overhand
Kai,Printer,190cm,Right,Overhand

📁 XLSX Version permalink

⬇️ Download the Sample Student Dataset


🎯 What This Lesson Covers permalink

Students will learn how to:

  • remove units (cm, ft, etc.)
  • correct inconsistent capitalization
  • remove hidden spaces
  • apply LOWER(), UPPER(), and PROPER()
  • safely use Find & Replace
  • convert formulas to text
  • delete blank or corrupted rows
  • prepare data for sorting, filtering, and pivot tables

This directly supports multiple early units in high school data science:

  • YouCubed HS Data Science Units 1 & 2 *
  • IDS (UCLA) early modules *
  • Maryland Data Science Pathway introductory competencies

Free Student Handout (Warm-Up) permalink

Use this before teaching the data cleaning steps — students highlight what looks “messy,” which primes them for the lesson.

Download FREE Student Handout


Common Data Cleaning Fixes Reference Sheet permalink

PDF Quick Reference handout for students covering removing whitespace, fixing capitalization, using find and replace, and dealing with blanks

Get the Reference Sheet

Full Teacher Bundle (Lesson Plan + Keys + Reference Sheet) permalink

The premium download includes:

  • 10-page teacher guide
  • Full answer key
  • Student-ready handout (editable)
  • Common Errors Reference Sheet
  • Discussion questions & extensions
  • STAR framework activity format
  • Classroom tips & differentiation options

Get the FULL Teacher Bundle


Why Data Cleaning First? permalink

Before students can analyze anything — in Sheets, CODAP, Tableau, or Python — they need to learn that:

  • data arrives messy
  • tools do not fix errors for them
  • small inconsistencies create big downstream problems
  • transparency + documentation = real data science
  • cleaning is part of “showing your work”

This mirrors industry practice and prepares them for later work in:

  • regressions
  • visualizations
  • modeling
  • simulations
  • coding in Python/Colab

🔗 More Resources Coming Soon permalink

The full Google Sheets series includes:

  1. Sorting & Filtering
  2. Descriptive Statistics
  3. Pivot Tables & Charts
  4. Simulations with RANDBETWEEN() + COUNTIF()
  5. Intro to CODAP
  6. Intro to Google Colab & Python

All will follow the same Solvefinity structure:
Learn it. Teach it. Bring it to life. permalink

*This activity and tutorial are not endorsed by, affiliated with, or sponsored by YouCubed, Stanford University, or the Illustrative Data Science curriculum. All references are for educational purposes only.