I’ve been working with data for around 20 years, and these are the easiest ways to improve data storage.
These tips apply to spreadsheets, CSVs, and simple tables.
Even if you’re just entering a few numbers into a spreadsheet, you never know how the data might be useful in the future. If you’ve stored data poorly in the past, it’s not a problem—we can wrangle messy data. But taking a few minutes to follow these simple guidelines up front can save hours later.
Quick checklist
- ✓ One data type per column
- ✓ Explicit missing values
- ✓ Consistent labels and naming
- ✓ README documentation nearby
1. Use consistent data types and formats for each column
Each column should contain one kind of data, stored consistently.
- Numbers only in numeric columns (no letters or comments).
- Dates in a single format, such as
YYYY-MM-DD. - Booleans as
true/falseor0/1—don’t mix and match. - Put comments in a separate notes column, not in the data itself.
2. Represent missing data explicitly
Use blank cells or NA to represent missing values.
Don’t encode missing data with values like -999,
which can be mistaken for real data.
3. Use consistent category labels
Pick one set of labels and stick with it. For example, use low,
medium, high
instead of mixing
Low, low,
med, Medium,
H
and high.
Inconsistent labels can create duplicate categories.
4. Column names: avoid spaces and special characters; include units
Clear names make analysis and automation much easier.
distance_kminstead ofDistance (km)room_temperature_cinstead oftempinternal_vibration_vbsinstead ofVibration
Example: Proper Data Storage
Less useful
| date | height | bear on site |
|---|---|---|
| 2024-02-28 | 1.8 | no |
| Feb 29, 2024 | 1.5 | TRUE |
| 1/3/2024 | 2ish | 0 |
More useful
| date_sampled | height_m | height_m_notes | bear_detected |
|---|---|---|---|
| 2024-02-28 | 1.8 | 0 | |
| 2024-02-29 | 1.5 | 1 | |
| 2024-03-01 | 2 | uncertain | 0 |
5. Include a README file with your data
In each folder where you store data, include a short README file that describes what each file contains. Write it so that someone else could understand it. Two minutes spent writing this now can save you hours of confusion next year.
6. Avoid using visual formatting as data
Don’t rely on color, bold text, or cell shading to store meaning (for example, “blue rows encode rainy days”).
Instead, store it explicitly in a column, such as
weather = "rain". (Don't be afraid to have many columns.)
7. Don’t use spaces or special characters in file names
Including the creation date in the filename is often very helpful.
For example: measurements_20240229.csv.
Good data storage doesn’t require advanced tools or expertise, just a few consistent habits. These practices make your data easier to understand, share, and use in the future. Clean data means fewer errors, fewer surprises, and faster analysis.
If you want to turn your data into actionable insight, learn how with QSC.