Functions and Visualizations
Welcome to lab 4! This week, we'll learn about functions and the table method apply from Section 8.1
(https://www.inferentialthinking.com/chapters/08/1/applying-a-function-to-a-column.html). We'll also learn
about visualization from Chapter 7 (https://www.inferentialthinking.com/chapters/07/visualization.html).
First, set up the tests and imports by running the cell below.
In [1]: import numpy as np
from datascience import *
# These lines set up graphing capabilities.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from client.api.notebook import Notebook
ok = Notebook('lab04.ok')
_ = ok.auth(inline=True)
=====================================================================
Assignment: Lab 4
OK, version v1.13.9
=====================================================================
Successfully logged in as
1. Functions and CEO Incomes
Let's start with a real data analysis task. We'll look at the 2015 compensation of CEOs at the 100 largest
companies in California. The data were compiled for a Los Angeles Times analysis here
(http://spreadsheets.latimes.com/california-ceo-compensation/), and ultimately came from filings
(https://www.sec.gov/answers/proxyhtf.htm) mandated by the SEC from all publicly-traded companies. Two
companies have two CEOs, so there are 102 CEOs in the dataset.
We've copied the data in raw form from the LA Times page into a file called raw_compensation.csv . (The
page notes that all dollar amounts are in millions of dollars.)
, In [2]: raw_compensation = Table.read_table('raw_compensation.csv')
raw_compensation
Out[2]: Ratio of CEO pay to
Company Total % Cash Equity Other
Rank Name average industry
(Headquarters) Pay Change Pay Pay Pay
worker pay
(No
Mark V. Oracle
1 $53.25 previous $0.95 $52.27 $0.02 362
Hurd* (Redwood City)
year)
(No
Safra A. Oracle
2 $53.24 previous $0.95 $52.27 $0.02 362
Catz* (Redwood City)
year)
Robert A. Walt Disney
3 $44.91 -3% $24.89 $17.28 $2.74 477
Iger (Burbank)
Marissa A. Yahoo!
4 $35.98 -15% $1.00 $34.43 $0.55 342
Mayer (Sunnyvale)
salesforce.com
5 Marc Benioff $33.36 -16% $4.65 $27.26 $1.45 338
(San Francisco)
John H. McKesson (San
6 $24.84 -4% $12.10 $12.37 $0.37 222
Hammergren Francisco)
John S. Chevron (San
7 $22.04 -15% $4.31 $14.68 $3.05 183
Watson Ramon)
Jeffrey LinkedIn
8 $19.86 27% $2.47 $17.26 $0.13 182
Weiner (Mountain View)
John T. Cisco Systems
9 $19.62 19% $5.10 $14.51 $0.01 170
Chambers** (San Jose)
John G. Wells Fargo
10 $19.32 -10% $6.80 $12.50 $0.02 256
Stumpf (San Francisco)
... (92 rows omitted)
Question 1. We want to compute the average of the CEOs' pay. Try running the cell below.
, In [3]: np.average(raw_compensation.column("Total Pay"))
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
<ipython-input-3-39b2b017c72a> in <module>()
----> 1 np.average(raw_compensation.column("Total Pay"))
//anaconda/lib/python3.5/site-packages/numpy/lib/function_base.py in a
verage(a, axis, weights, returned)
1108
1109 if weights is None:
-> 1110 avg = a.mean(axis)
1111 scl = avg.dtype.type(a.size/avg.size)
1112 else:
//anaconda/lib/python3.5/site-packages/numpy/core/_methods.py in _mean
(a, axis, dtype, out, keepdims)
68 is_float16_result = True
69
---> 70 ret = umr_sum(arr, axis, dtype, out, keepdims)
71 if isinstance(ret, mu.ndarray):
72 ret = um.true_divide(
TypeError: cannot perform reduce with flexible type
You should see an error. Let's examine why this error occured by looking at the values in the "Total Pay"
column. Use the type function and set total_pay_type to the type of the first value in the "Total Pay"
column.
In [4]: total_pay_type = type(raw_compensation.column("Total Pay").item(0)) #SO
LUTION
total_pay_type
Out[4]: str
In [5]: _ = ok.grade('q1_1')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests
---------------------------------------------------------------------
Test summary
Passed: 3
Failed: 0
[ooooooooook] 100.0% passed
Welcome to lab 4! This week, we'll learn about functions and the table method apply from Section 8.1
(https://www.inferentialthinking.com/chapters/08/1/applying-a-function-to-a-column.html). We'll also learn
about visualization from Chapter 7 (https://www.inferentialthinking.com/chapters/07/visualization.html).
First, set up the tests and imports by running the cell below.
In [1]: import numpy as np
from datascience import *
# These lines set up graphing capabilities.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from client.api.notebook import Notebook
ok = Notebook('lab04.ok')
_ = ok.auth(inline=True)
=====================================================================
Assignment: Lab 4
OK, version v1.13.9
=====================================================================
Successfully logged in as
1. Functions and CEO Incomes
Let's start with a real data analysis task. We'll look at the 2015 compensation of CEOs at the 100 largest
companies in California. The data were compiled for a Los Angeles Times analysis here
(http://spreadsheets.latimes.com/california-ceo-compensation/), and ultimately came from filings
(https://www.sec.gov/answers/proxyhtf.htm) mandated by the SEC from all publicly-traded companies. Two
companies have two CEOs, so there are 102 CEOs in the dataset.
We've copied the data in raw form from the LA Times page into a file called raw_compensation.csv . (The
page notes that all dollar amounts are in millions of dollars.)
, In [2]: raw_compensation = Table.read_table('raw_compensation.csv')
raw_compensation
Out[2]: Ratio of CEO pay to
Company Total % Cash Equity Other
Rank Name average industry
(Headquarters) Pay Change Pay Pay Pay
worker pay
(No
Mark V. Oracle
1 $53.25 previous $0.95 $52.27 $0.02 362
Hurd* (Redwood City)
year)
(No
Safra A. Oracle
2 $53.24 previous $0.95 $52.27 $0.02 362
Catz* (Redwood City)
year)
Robert A. Walt Disney
3 $44.91 -3% $24.89 $17.28 $2.74 477
Iger (Burbank)
Marissa A. Yahoo!
4 $35.98 -15% $1.00 $34.43 $0.55 342
Mayer (Sunnyvale)
salesforce.com
5 Marc Benioff $33.36 -16% $4.65 $27.26 $1.45 338
(San Francisco)
John H. McKesson (San
6 $24.84 -4% $12.10 $12.37 $0.37 222
Hammergren Francisco)
John S. Chevron (San
7 $22.04 -15% $4.31 $14.68 $3.05 183
Watson Ramon)
Jeffrey LinkedIn
8 $19.86 27% $2.47 $17.26 $0.13 182
Weiner (Mountain View)
John T. Cisco Systems
9 $19.62 19% $5.10 $14.51 $0.01 170
Chambers** (San Jose)
John G. Wells Fargo
10 $19.32 -10% $6.80 $12.50 $0.02 256
Stumpf (San Francisco)
... (92 rows omitted)
Question 1. We want to compute the average of the CEOs' pay. Try running the cell below.
, In [3]: np.average(raw_compensation.column("Total Pay"))
----------------------------------------------------------------------
-----
TypeError Traceback (most recent call
last)
<ipython-input-3-39b2b017c72a> in <module>()
----> 1 np.average(raw_compensation.column("Total Pay"))
//anaconda/lib/python3.5/site-packages/numpy/lib/function_base.py in a
verage(a, axis, weights, returned)
1108
1109 if weights is None:
-> 1110 avg = a.mean(axis)
1111 scl = avg.dtype.type(a.size/avg.size)
1112 else:
//anaconda/lib/python3.5/site-packages/numpy/core/_methods.py in _mean
(a, axis, dtype, out, keepdims)
68 is_float16_result = True
69
---> 70 ret = umr_sum(arr, axis, dtype, out, keepdims)
71 if isinstance(ret, mu.ndarray):
72 ret = um.true_divide(
TypeError: cannot perform reduce with flexible type
You should see an error. Let's examine why this error occured by looking at the values in the "Total Pay"
column. Use the type function and set total_pay_type to the type of the first value in the "Total Pay"
column.
In [4]: total_pay_type = type(raw_compensation.column("Total Pay").item(0)) #SO
LUTION
total_pay_type
Out[4]: str
In [5]: _ = ok.grade('q1_1')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests
---------------------------------------------------------------------
Test summary
Passed: 3
Failed: 0
[ooooooooook] 100.0% passed