General assignment information
- Find a dataset.
- It must have:
- At least one numeric column
- Between one thousand and one million rows
- If it's larger than that, you can filter it down.
- Don't spend too long on this step.
- It must have:
- If there's more than one numeric column, pick one.
- Create a new notebook.
- Using pandas:
- Read in the data.
- Compute:
- The mean
- The median
- The mode
- Do a
groupby()
with an aggregation.
- Read The Joys (and Woes) of the Craft of Software Engineering
- Note not everything in there is applicable to data analysis
- Filtering/indexing
DataFrame
s - Learn about functions
- Coding Style Guides - Please skim these; I don't expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.
- Guide to commenting your code
- Quartz Guide to Bad Data
- Learn about data dictionaries
- Glance through pandas' comparison with other tools for any you are familiar with
- More on indexing:
- How to Select Rows from Pandas DataFrame
- Selecting Subsets of Data in Pandas: Part 1 and Part 2
Reminder about the between-class participation requirement.