Tweaks to exercises.

justinbois · Jun 13, 2024 · d5ca083 · d5ca083
1 parent b67032a
commit d5ca083
Show file tree

Hide file tree

Showing 117 changed files with 255 additions and 187 deletions.
diff --git a/2024/_sources/exercises/exercise_3/exercise_3.2.ipynb.txt b/2024/_sources/exercises/exercise_3/exercise_3.2.ipynb.txt
@@ -1 +1 @@
-{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.2: Split-Apply-Combine of the frog data set\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "**a)** Compute standard deviation of the impact forces for each frog.\n", "\n", "**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.\n", "\n", "**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "**d)** Now tidy this data frame. It might help to read [the documentation about melting](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3"}}, "nbformat": 4, "nbformat_minor": 4}
+{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.2: Split-Apply-Combine of the frog data set\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "**a)** Compute standard deviation of the impact forces for each frog.\n", "\n", "**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.\n", "\n", "**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "**d)** Now tidy this data frame. It might help to read [the documentation about melting](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9"}}, "nbformat": 4, "nbformat_minor": 4}
diff --git a/2024/_sources/exercises/exercise_3/exercise_3.3.ipynb.txt b/2024/_sources/exercises/exercise_3/exercise_3.3.ipynb.txt
@@ -1 +1 @@
-{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.3: Adding data to a data frame\n", "\n", "<hr>"]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {}, "source": ["<hr>\n", "\n", "We continue working with the frog tongue data. Recall that the header comments in the data file contained information about the frogs."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["# These data are from the paper,\n", "#   Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.\n", "# It was featured in the New York Times.\n", "#    http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html\n", "#\n", "# The authors included the data in their supplemental information.\n", "#\n", "# Importantly, the ID refers to the identifites of the frogs they tested.\n", "#   I:   adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,\n", "#        Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "#   II:  adult, 70 mm SVL and 72.7 g body weight,\n", "#        Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "#   III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "#   IV:  juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)\n", "2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030\n", "2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695\n", "2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239\n", "2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381\n", "2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975\n"]}], "source": ["!head -20 data/frog_tongue_adhesion.csv"]}, {"cell_type": "markdown", "metadata": {}, "source": ["So, each frog has associated with it an age (adult or juvenile), snout-vent-length (SVL), body weight, and species (either cross or *cranwelli*). For a tidy data frame, we should have a column for each of these values. Your task is to load in the data, and then add these columns to the data frame. For convenience, here is a data frame with data about each frog."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["df_frog = pd.DataFrame(\n", "    data={\n", "        \"ID\": [\"I\", \"II\", \"III\", \"IV\"],\n", "        \"age\": [\"adult\", \"adult\", \"juvenile\", \"juvenile\"],\n", "        \"SVL (mm)\": [63, 70, 28, 31],\n", "        \"weight (g)\": [63.1, 72.7, 12.7, 12.7],\n", "        \"species\": [\"cross\", \"cross\", \"cranwelli\", \"cranwelli\"],\n", "    }\n", ")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Note: There are lots of ways to solve this problem. This is a good exercise in searching through the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/) and other online resources, such as [Stack Overflow](https://stackoverflow.com/questions). Remember, much of your programming efforts are spent searching through documentation and the internet.\n", "\n", "Finally, as a fun challenge, see if you can highlight the strike with the highest impact force for each frog in the data frame."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3"}}, "nbformat": 4, "nbformat_minor": 4}
+{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.3: Adding data to a data frame\n", "\n", "<hr>"]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["import pandas as pd"]}, {"cell_type": "markdown", "metadata": {}, "source": ["<hr>\n", "\n", "We continue working with the frog tongue data. Recall that the header comments in the data file contained information about the frogs."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["# These data are from the paper,\n", "#   Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.\n", "# It was featured in the New York Times.\n", "#    http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html\n", "#\n", "# The authors included the data in their supplemental information.\n", "#\n", "# Importantly, the ID refers to the identifites of the frogs they tested.\n", "#   I:   adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,\n", "#        Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "#   II:  adult, 70 mm SVL and 72.7 g body weight,\n", "#        Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "#   III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "#   IV:  juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)\n", "2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030\n", "2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695\n", "2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239\n", "2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381\n", "2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975\n"]}], "source": ["!head -20 data/frog_tongue_adhesion.csv"]}, {"cell_type": "markdown", "metadata": {}, "source": ["So, each frog has associated with it an age (adult or juvenile), snout-vent-length (SVL), body weight, and species (either cross or *cranwelli*). For a tidy data frame, we should have a column for each of these values. Your task is to load in the data, and then add these columns to the data frame. For convenience, here is a data frame with data about each frog."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["df_frog = pd.DataFrame(\n", "    data={\n", "        \"ID\": [\"I\", \"II\", \"III\", \"IV\"],\n", "        \"age\": [\"adult\", \"adult\", \"juvenile\", \"juvenile\"],\n", "        \"SVL (mm)\": [63, 70, 28, 31],\n", "        \"weight (g)\": [63.1, 72.7, 12.7, 12.7],\n", "        \"species\": [\"cross\", \"cross\", \"cranwelli\", \"cranwelli\"],\n", "    }\n", ")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Note: There are lots of ways to solve this problem. This is a good exercise in searching through the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/) and other online resources, such as [Stack Overflow](https://stackoverflow.com/questions). Remember, much of your programming efforts are spent searching through documentation and the internet.\n", "\n", "Finally, as a fun challenge, see if you can highlight the strike with the highest impact force for each frog in the data frame."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9"}}, "nbformat": 4, "nbformat_minor": 4}
diff --git a/2024/_sources/exercises/exercise_3/exercise_3.4.ipynb.txt b/2024/_sources/exercises/exercise_3/exercise_3.4.ipynb.txt
@@ -1 +1 @@
-{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.4: Axes with logarithmic scale and error bars\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sometimes you need to plot your data with a logarithmic scale. As an example, let's consider the classic genetic switch engineered by Jim Collins and coworkers ([Gardner, et al., *Nature*, **403**, 339, 2000](https://doi.org/10.1038/35002131)). This genetic switch was incorporated into *E. coli* and is inducible by adjusting the concentration of the lactose analog IPTG. The readout is the fluorescence intensity of GFP.\n", "\n", "The data set has the IPTG concentrations and GFP fluorescence intensity. The data are in the file `~/git/data/collins_switch.csv`. Let's look at it."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["# Data digitized from Fig. 5a of Gardner, et al., *Nature*, **403**, 339, 2000. The last column gives the standard error of the mean normalized GFP intensity.\n", "[IPTG] (mM),normalized GFP expression (a.u.),sem\n", "0.001000,0.004090,0.003475\n", "0.010000,0.010225,0.002268\n", "0.020000,0.022495,0.004781\n", "0.030000,0.034765,0.003000\n", "0.040000,0.067485,0.006604\n", "0.040000,0.668712,0.087862\n", "0.060000,0.740286,0.045853\n", "0.100000,0.840491,0.058986\n", "0.300000,0.936605,0.026931\n", "0.600000,0.961145,0.093553\n", "1.000000,0.940695,0.037624\n", "3.000000,0.852761,0.059035\n", "6.000000,0.910020,0.051052\n", "10.000000,0.893661,0.042773\n"]}], "source": ["!cat data/collins_switch.csv"]}, {"cell_type": "markdown", "metadata": {}, "source": ["It has two rows of non-data. Then, Column 1 is the IPTG concentration, column 2 is the normalized GFP expression level, and the last column is the standard error of the mean normalized GFP intensity. This gives the error bars, which we will plot momentarily. For now, we will just plot IPTG versus normalized GFP intensity.\n", "\n", "In looking at the data set, note that there are two entries for [IPTG] = 0.04 mM. At this concentration, the switch happens, and there are two populations of cells, one with high expression of GFP and one with low. The two data points represent these two populations of cells.\n", "\n", "**a)** Now, let's make a plot of IPTG versus GFP.\n", "\n", "1. Load in the data set using Pandas. Make sure you use the `comment` kwarg of pd.read_csv() properly.\n", "2. Make a plot of normalized GFP intensity (y-axis) versus IPTG concentration (x-axis).\n", "\n", "**b)** Now that you have done that, there are some problems with the plot. It is really hard to see the data points with low concentrations of IPTG. In fact, looking at the data set, the concentration of IPTG varies over four orders of magnitude. When you have data like this, it is wise to plot them on a logarithmic scale. You can specify the x-axis as logarithmic when you instantiate a figure with `bokeh.plotting.figure()` by using the `x_axis_type='log'` kwarg. (The obvious analogous kwarg applied for the y-axis.) For this data set, it is definitely best to have the x-axis on a logarithmic scale. Remake the plot you just did with the x-axis logarithmically scaled.\n", "\n", "**c)** The data set also contains the standard error of the mean, or SEM. The SEM is often displayed on plots as error bars. Now construct the plot with error bars.\n", "\n", "1. Add columns `error_low` and `error_high` to the data frame containing the Collins data. These will set the bottoms and tops of the error bars. You should base the values in these columns on the standard error of the mean (`sem`). Assuming a Gaussian model, the 95% confidence interval is \u00b11.96 times the s.e.m.\n", "2. Make a plot with the measured expression levels and the error bars. *Hint*: Check out the [Bokeh docs](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html) and think about what kind of glyph works best for error bars."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3"}}, "nbformat": 4, "nbformat_minor": 4}
+{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.4: Axes with logarithmic scale and error bars\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sometimes you need to plot your data with a logarithmic scale. As an example, let's consider the classic genetic switch engineered by Jim Collins and coworkers ([Gardner, et al., *Nature*, **403**, 339, 2000](https://doi.org/10.1038/35002131)). This genetic switch was incorporated into *E. coli* and is inducible by adjusting the concentration of the lactose analog IPTG. The readout is the fluorescence intensity of GFP.\n", "\n", "The data set has the IPTG concentrations and GFP fluorescence intensity. The data are in the file `~/git/data/collins_switch.csv`. Let's look at it."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["# Data digitized from Fig. 5a of Gardner, et al., *Nature*, **403**, 339, 2000. The last column gives the standard error of the mean normalized GFP intensity.\n", "[IPTG] (mM),normalized GFP expression (a.u.),sem\n", "0.001000,0.004090,0.003475\n", "0.010000,0.010225,0.002268\n", "0.020000,0.022495,0.004781\n", "0.030000,0.034765,0.003000\n", "0.040000,0.067485,0.006604\n", "0.040000,0.668712,0.087862\n", "0.060000,0.740286,0.045853\n", "0.100000,0.840491,0.058986\n", "0.300000,0.936605,0.026931\n", "0.600000,0.961145,0.093553\n", "1.000000,0.940695,0.037624\n", "3.000000,0.852761,0.059035\n", "6.000000,0.910020,0.051052\n", "10.000000,0.893661,0.042773\n"]}], "source": ["!cat data/collins_switch.csv"]}, {"cell_type": "markdown", "metadata": {}, "source": ["It has two rows of non-data. Then, Column 1 is the IPTG concentration, column 2 is the normalized GFP expression level, and the last column is the standard error of the mean normalized GFP intensity. This gives the error bars, which we will plot momentarily. For now, we will just plot IPTG versus normalized GFP intensity.\n", "\n", "In looking at the data set, note that there are two entries for [IPTG] = 0.04 mM. At this concentration, the switch happens, and there are two populations of cells, one with high expression of GFP and one with low. The two data points represent these two populations of cells.\n", "\n", "**a)** Now, let's make a plot of IPTG versus GFP.\n", "\n", "1. Load in the data set using Pandas. Make sure you use the `comment` kwarg of pd.read_csv() properly.\n", "2. Make a plot of normalized GFP intensity (y-axis) versus IPTG concentration (x-axis).\n", "\n", "**b)** Now that you have done that, there are some problems with the plot. It is really hard to see the data points with low concentrations of IPTG. In fact, looking at the data set, the concentration of IPTG varies over four orders of magnitude. When you have data like this, it is wise to plot them on a logarithmic scale. You can specify the x-axis as logarithmic when you instantiate a figure with `bokeh.plotting.figure()` by using the `x_axis_type='log'` kwarg. (The obvious analogous kwarg applied for the y-axis.) For this data set, it is definitely best to have the x-axis on a logarithmic scale. Remake the plot you just did with the x-axis logarithmically scaled.\n", "\n", "**c)** The data set also contains the standard error of the mean, or SEM. The SEM is often displayed on plots as error bars. Now construct the plot with error bars.\n", "\n", "1. Add columns `error_low` and `error_high` to the data frame containing the Collins data. These will set the bottoms and tops of the error bars. You should base the values in these columns on the standard error of the mean (`sem`). Assuming a Gaussian model, the 95% confidence interval is \u00b11.96 times the s.e.m.\n", "2. Make a plot with the measured expression levels and the error bars. *Hint*: Check out the [Bokeh docs](https://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html) and think about what kind of glyph works best for error bars."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9"}}, "nbformat": 4, "nbformat_minor": 4}
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.2: Split-Apply-Combine of the frog data set\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "a) Compute standard deviation of the impact forces for each frog.\n", "\n", "b) Compute the coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "c) Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "d) Now tidy this data frame. It might help to read [the documentation about melting](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3"}}, "nbformat": 4, "nbformat_minor": 4}
		{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 3.2: Split-Apply-Combine of the frog data set\n", "\n", "<hr>"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "a) Compute standard deviation of the impact forces for each frog.\n", "\n", "b) Compute the coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "c) Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "d) Now tidy this data frame. It might help to read [the documentation about melting](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["<br />"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9"}}, "nbformat": 4, "nbformat_minor": 4}