1- My First Diagram Using Python

We start learning Python for the analysis of geological data performing two basic everyday operations. They are the importing of a dataset in Pandas and the subsequent visualization of selected features in a binary diagram. Let'start with the importing of a dataset using the pandas library, but what is pandas? Pandas is a pyton libary (i.e., a tool) designed to help us in working with structured data. In the practice, it provides us several, ready to use, commands to work with data. As an example, we can easly use pandas to import a dataset stored in a text file or an Excel worksheet using a single row of code. To understand, look at these two following examples:

In [15]:
import pandas as pd

#Exampe 1
myDataset1 = pd.read_csv('Smith_glass_post_NYT_data.csv')

#Exampe 2
myDataset2 = pd.read_excel('Smith_glass_post_NYT_data.xlsx', sheet_name='Supp_traces')

In the first example, we define a pandas DataFrame (i.e., myDataset1) reading a comma delimited, text file. As repoted in the official documentation of the Pandas library, a DataFrame "is a 2-dimensional labeled data structure with columns of potentially different types". What does it mean? We can imagine a dataframe as a fully editable, powerful table:

In [16]:
from IPython.display import display

display(myDataset1)
Analysis no. Strat. Pos. Eruption controlcode Sample Epoch Crater size Date of analysis Si/bulk cps SiO2* (EMP) ... Ho Er Tm Yb Lu Hf Ta Pb Th U
0 1915 63 Astroni 7 1 79 3b 20 100210am 20.21 59.27 ... 1.11 3.26 0.47 2.80 0.43 7.84 2.96 60.93 35.02 9.20
1 1916 63 Astroni 7 1 79 3b 20 100210am 11.92 59.27 ... 1.08 2.27 0.46 3.14 0.46 7.33 3.52 59.89 34.46 10.46
2 1917 63 Astroni 7 1 79 3b 20 100210am 17.06 59.27 ... 1.25 3.69 0.61 3.51 0.63 8.43 3.05 49.87 29.22 8.73
3 1918 63 Astroni 7 1 79 3b 20 100210am 24.52 59.27 ... 1.24 3.72 0.46 3.04 0.44 8.95 3.08 59.59 30.71 9.79
4 1919 63 Astroni 7 1 79 3b 20 100210am 14.35 59.27 ... 1.08 2.68 0.46 2.79 0.41 7.24 2.67 60.70 32.13 9.01
5 1920 63 Astroni 7 1 79 3b 20 100210am 17.69 59.27 ... 1.11 3.25 0.48 2.78 0.56 7.33 3.37 59.37 34.26 9.97
6 1921 63 Astroni 7 1 79 3b 20 100210am 12.83 59.27 ... 0.83 2.95 0.42 2.96 0.46 8.26 3.44 61.26 36.34 10.41
7 1922 63 Astroni 7 1 79 3b 20 100210am 15.10 59.27 ... 1.14 3.12 0.35 2.95 0.50 8.39 3.53 61.60 36.82 10.50
8 1923 63 Astroni 7 1 79 3b 20 100210am 18.62 59.27 ... 1.01 2.89 0.65 3.02 0.54 8.79 3.65 65.15 37.71 10.51
9 1924 63 Astroni 7 1 79 3b 20 100210am 19.54 59.27 ... 1.26 3.05 0.59 3.04 0.47 8.22 3.45 61.56 37.74 10.90
10 1925 63 Astroni 7 1 79 3b 20 100210am 17.83 59.27 ... 1.04 2.81 0.56 2.85 0.42 7.43 2.89 54.99 35.40 9.97
11 1926 63 Astroni 7 1 79 3b 20 100210am 16.15 59.27 ... 0.95 2.98 0.51 2.98 0.43 7.03 3.21 132.35 31.80 9.76
12 1927 63 Astroni 7 1 79 3b 20 100210am 17.47 59.27 ... 1.10 2.93 0.45 3.29 0.38 7.86 3.60 NaN 36.63 11.09
13 1928 63 Astroni 7 1 79 3b 20 100210am 16.62 59.27 ... 1.04 2.84 0.41 2.89 0.32 7.61 3.35 69.64 34.07 10.09
14 1929 62 Astroni 6 1 78-2 3b 20 100210pm 16.34 59.11 ... 1.10 2.45 0.36 2.25 0.41 6.65 2.90 98.34 29.45 9.23
15 1930 62 Astroni 6 1 78-2 3b 20 100210pm 20.12 58.86 ... 1.07 2.60 0.51 2.93 0.39 6.17 3.02 55.54 30.16 8.70
16 1931 62 Astroni 6 1 78-2 3b 20 100210pm 16.69 58.86 ... 0.94 2.02 0.45 2.29 0.46 7.23 2.74 78.82 30.43 8.84
17 1932 62 Astroni 6 1 78-2 3b 20 100210pm 17.65 58.86 ... 1.04 2.87 0.36 2.82 0.32 6.61 2.83 85.46 28.21 8.80
18 1933 62 Astroni 6 1 78-2 3b 20 100210pm 18.76 58.86 ... 0.93 2.53 0.37 2.32 0.52 6.36 2.64 74.03 30.46 8.70
19 1934 62 Astroni 6 1 78-2 3b 20 100210pm 11.45 58.86 ... 0.95 2.26 0.33 2.39 0.30 6.28 2.41 77.03 26.94 9.10
20 1935 62 Astroni 6 1 78-2 3b 20 100210pm 8.79 58.86 ... 1.01 2.51 0.33 3.46 0.37 6.07 2.44 46.64 28.30 7.93
21 1936 62 Astroni 6 1 78-2 3b 20 100210pm 19.46 58.86 ... 0.94 2.52 0.48 2.40 0.42 5.55 2.49 83.78 28.55 8.31
22 1937 62 Astroni 6 1 78-2 3b 20 100210pm 22.24 58.86 ... 0.86 2.44 0.44 2.39 0.41 6.23 2.61 61.07 28.77 9.00
23 1938 62 Astroni 6 1 78-2 3b 20 100210pm 18.05 58.86 ... 1.12 3.22 0.54 2.89 0.40 7.21 3.22 63.39 33.92 9.31
24 1939 62 Astroni 6 1 78-2 3b 20 100210pm 26.26 58.86 ... 0.95 2.61 0.47 2.74 0.43 6.72 3.02 61.63 31.29 8.88
25 1940 62 Astroni 6 1 78-2 3b 20 100210pm 13.08 58.86 ... 0.98 2.73 0.31 2.38 0.42 6.06 2.93 56.91 30.09 9.01
26 1941 62 Astroni 6 1 78-2 3b 20 100210pm 19.75 58.86 ... 0.80 2.27 0.41 2.67 0.41 5.52 2.97 74.38 28.06 8.60
27 1942 62 Astroni 6 1 78-2 3b 20 100210pm 18.15 58.86 ... 1.08 2.91 0.48 2.88 0.32 6.90 3.05 56.54 30.89 9.24
28 1943 62 Astroni 6 1 78-1 3b 20 100210pm 12.34 59.11 ... 0.98 2.46 0.37 3.14 0.51 8.05 3.11 92.82 36.91 10.67
29 1944 62 Astroni 6 1 78-1 3b 20 100210pm 11.94 59.11 ... 1.25 2.50 0.41 3.38 0.41 8.40 2.69 93.41 34.75 8.69
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
340 2255 14 PP 9 6 1 20 030810am 12.98 58.06 ... 1.19 2.54 0.36 2.22 0.44 6.00 2.56 56.11 27.75 8.29
341 2256 14 PP 9 6 1 20 030810am 13.08 58.06 ... 0.98 2.75 0.28 2.76 0.43 5.99 2.63 49.06 28.34 7.92
342 2257 14 PP 9 6 1 20 030810am 7.35 58.06 ... 1.17 3.29 0.40 3.87 0.33 5.97 2.32 49.21 29.22 14.12
343 2258 14 PP 9 6 1 20 030810am 7.16 58.06 ... 1.72 3.30 0.37 3.31 0.57 8.45 3.42 66.51 36.59 9.79
344 2259 14 PP 9 6 1 20 030810am 5.24 58.06 ... 1.19 3.19 0.44 2.77 0.42 7.14 2.30 58.58 27.93 7.12
345 2260 14 PP 9 6 1 20 030810am 2.87 58.06 ... 0.95 2.83 1.26 2.51 NaN 2.77 1.83 46.38 22.05 5.40
346 2261 14 PP 9 6 1 20 030810am 17.78 58.06 ... 1.01 2.61 0.37 2.84 0.40 6.17 2.34 63.50 27.73 8.37
347 2262 14 PP 9 6 1 20 030810am 10.65 58.06 ... 0.88 3.16 0.46 2.59 0.32 7.23 2.36 99.44 28.52 8.56
348 2263 12 Soccavo 1 8 7 1 20 040810pm 6.77 58.70 ... 1.33 1.54 0.21 2.75 0.45 5.14 2.42 68.03 26.49 8.18
349 2264 12 Soccavo 1 8 7 1 20 040810pm 15.47 58.70 ... 1.39 4.57 0.55 2.67 0.41 5.80 2.45 61.38 27.07 9.81
350 2265 12 Soccavo 1 8 7 1 20 040810pm 12.23 58.70 ... 1.14 2.73 0.37 2.26 0.33 6.50 2.46 54.63 27.77 8.50
351 2266 12 Soccavo 1 8 7 1 20 040810pm 10.28 58.70 ... 0.97 2.42 0.23 2.73 0.01 5.32 2.24 69.05 27.91 8.15
352 2267 12 Soccavo 1 8 7 1 20 040810pm 5.69 58.70 ... 0.97 1.57 0.41 1.68 0.54 6.60 1.46 50.24 26.99 6.78
353 2268 12 Soccavo 1 8 7 1 20 040810pm 13.85 58.70 ... 1.17 1.93 0.48 2.00 0.49 6.59 2.38 54.48 26.90 7.59
354 2269 12 Soccavo 1 8 7 1 20 040810pm 19.06 58.70 ... 1.14 3.13 0.52 2.95 0.36 6.82 2.59 60.31 29.80 8.63
355 2270 12 Soccavo 1 8 7 1 20 040810pm 12.21 58.70 ... 1.35 4.69 0.87 2.81 0.50 7.71 3.28 57.06 34.09 10.20
356 2271 12 Soccavo 1 8 7 1 20 040810pm 11.31 58.70 ... 1.03 2.29 0.54 2.71 0.37 7.42 2.42 54.54 28.98 7.89
357 2272 12 Soccavo 1 8 7 1 20 040810pm 8.97 58.70 ... 1.34 3.09 0.24 4.01 0.20 8.12 2.93 65.70 29.77 8.46
358 2273 12 Soccavo 1 8 7 1 20 040810pm 7.99 58.70 ... 0.90 2.69 0.69 2.73 0.35 4.83 3.26 67.31 25.67 7.41
359 2274 12 Soccavo 1 8 7 1 20 040810pm 16.21 58.70 ... 0.78 3.09 0.68 2.68 0.34 7.36 2.31 66.75 29.03 8.12
360 2275 12 Soccavo 1 8 7 1 20 040810pm 8.87 58.70 ... 1.13 2.82 0.15 2.80 0.54 5.77 2.71 59.85 27.07 8.94
361 2276 12 Soccavo 1 8 7 1 20 040810pm 5.26 58.70 ... 0.60 3.15 0.27 2.11 0.02 7.89 1.69 54.05 25.72 8.04
362 2277 12 Soccavo 1 8 7 1 20 040810pm 19.61 58.70 ... 1.43 3.68 0.41 3.30 0.58 6.48 2.43 52.10 30.40 8.19
363 2278 12 Soccavo 1 8 7 1 20 040810pm 17.17 58.70 ... 1.41 2.81 0.39 3.35 0.38 6.59 2.93 52.37 29.47 9.18
364 2279 12 Soccavo 1 8 7 1 20 040810pm 9.45 58.70 ... 1.08 2.58 0.63 2.38 0.66 6.91 2.35 50.99 42.19 8.09
365 2280 12 Soccavo 1 8 7 1 20 040810pm 12.05 58.70 ... 1.17 1.83 0.73 2.07 0.44 5.57 2.85 57.78 29.26 9.66
366 2281 12 Soccavo 1 8 7 1 20 040810pm 9.96 58.70 ... 1.07 2.36 0.66 2.79 0.30 7.75 2.51 43.79 25.33 7.09
367 2282 12 Soccavo 1 8 7 1 20 040810pm 29.81 58.70 ... 0.99 3.30 0.44 2.84 0.41 7.36 2.81 59.68 32.59 9.77
368 2283 12 Soccavo 1 8 7 1 20 040810pm 10.62 58.70 ... 0.95 3.04 0.14 2.04 0.30 6.45 2.05 53.78 26.46 8.30
369 2284 12 Soccavo 1 8 7 1 20 040810pm 10.67 58.70 ... 0.84 2.60 0.52 2.57 0.41 6.10 2.72 55.02 27.73 8.76

370 rows × 37 columns

The second example is similar to the first one but it reads an Excel file. Also, being an Excel file potentially made of several spreadsheets, it point to a specific one: Supp_traces. The imported dataset contains trace element chemical concetrations of volcanic products (i.e. tephras) published in a scientific contribution by Smith et al. (2011). It will be used as a representative proxy of a scientific dataset. In detail, it consists of major (Supp_majors) and trace element (Supp_traces) analyses of tephra samples belonging to the recent activity (last 15 Ky) of the Campi Flergrei Caldera.

To start plotting, we can use the matplotlib library. As reported in the official documentation, it is "a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms." In detail, it "can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code". Let's start plotting:

In [6]:
import matplotlib.pyplot as plt

x = myDataset1.Zr
y = myDataset1.Th

plt.scatter(x, y)
plt.show()
<Figure size 640x480 with 1 Axes>

Now, we can start adding features to the diagram. They could be a title or axis labels:

In [5]:
plt.figure() 
plt.scatter(x, y)
plt.title("My First Diagram")
plt.xlabel("Zr [ppm]")
plt.ylabel("Th [ppm]")
plt.show()

To start improving our knowledge about the use of python in the visualization of scientific data, we are going to show how to filter or slice our dataset. As an example, we can plot the analyses cheracterized by Zr contents major and minor than 450 ppm in blue and red respectively, also adding a legend.

In [36]:
# Define two sub-dataset for Zr>450 and Zr<450 respectively 
mySubDataset1= myDataset1[myDataset1.Zr> 450] 
mySubDataset2= myDataset1[myDataset1.Zr< 450]

#generate a new picture 
plt.figure() 
# Generate the scatter Zr Vs Th diagram for Zr > 450 
# in blue also defining the legend caption as "Zr > 450 [ppm]"  
x1 = mySubDataset1.Zr
y1 = mySubDataset1.Th
plt.scatter(x1, y1, color='blue', label= "Zr > 450 [ppm]") 
# Generate the scatter Zr Vs Th diagram for Zr < 450 
# in red also defining the legend caption as "Zr < 450 [ppm]"
x2 = mySubDataset2.Zr
y2 = mySubDataset2.Th
plt.scatter(x2, y2, color='red', label= "Zr < 450 [ppm]")

plt.title("My Second Diagram")
plt.xlabel("Zr [ppm]")
plt.ylabel("Th [ppm]")
# generate the legend
plt.legend()
plt.show()

Now, we are going to learn how to filter our dataset using the values reported in the column 'Epoch' (i.e., 1, 2, 3, and 3d) that subdivide the eruptions studied by Smith et al. (2011) in four different periods. We will sart ploting the different Epochs with different colors and labels:

In [7]:
plt.figure()

myData1 = myDataset1[(myDataset1.Epoch.astype(str) == '1')]
plt.scatter(myData1.Zr, myData1.Th, label='Epoch 1')

myData2 = myDataset1[(myDataset1.Epoch.astype(str) == '2')]
plt.scatter(myData2.Zr, myData2.Th, label='Epoch 2')

myData3 = myDataset1[(myDataset1.Epoch.astype(str) == '3')]
plt.scatter(myData3.Zr, myData3.Th, label='Epoch 3')

myData4 = myDataset1[(myDataset1.Epoch.astype(str) == '3b')]
plt.scatter(myData4.Zr, myData4.Th, label='Epoch 3b')
    
plt.title("My Third Diagram")   
plt.xlabel("Zr [ppm]")
plt.ylabel("Th [ppm]")
plt.legend()

plt.show()

The readers that are already familiar with the python progamming languages could suggest a way to compress the code reported above making it more coincise:

In [39]:
epochs = ['1','2','3','3b']

plt.figure()
for epoch in epochs:
    myData = myDataset1[(myDataset1.Epoch.astype(str) == epoch)]
    plt.scatter(myData.Zr, myData.Th, label="Epoch " + epoch)

plt.title("My Third Diagram again")
plt.xlabel("Zr [ppm]")
plt.ylabel("Th [ppm]")
plt.legend()

plt.show()

In python the for loop is utilized to repeat a block of code. You should learn how to use it toghrter with the other compound statements. We will describe the compound statements later in the book (Cap XX). However, please note that you can succesfully complete many tasks witout a deep knowledge of the syntax and “core semantics” of the python language.

Finally, we will plot the different ephocs in different subplots, also setting the same values for the x and y axes:

In [42]:
plt.figure()

f, axarr = plt.subplots(2, 2)

axarr[0, 0].scatter(myData1.Zr, myData1.Th, label='Epoch 1')
axarr[0, 0].set_xlabel("Zr [ppm]")
axarr[0, 0].set_ylabel("Th [ppm]")
axarr[0, 0].set_xlim([100, 1000])
axarr[0, 0].set_ylim([0, 100])
axarr[0, 0].legend()

axarr[1, 0].scatter(myData2.Zr, myData2.Th, label='Epoch 2')
axarr[1, 0].set_xlabel("Zr [ppm]")
axarr[1, 0].set_ylabel("Th [ppm]")
axarr[1, 0].set_xlim([100, 1000])
axarr[1, 0].set_ylim([0, 100])
axarr[1, 0].legend()

axarr[0, 1].scatter(myData3.Zr, myData3.Th, label='Epoch 3')
axarr[0, 1].set_xlabel("Zr [ppm]")
axarr[0, 1].set_ylabel("Th [ppm]")
axarr[0, 1].set_xlim([100, 1000])
axarr[0, 1].set_ylim([0, 100])
axarr[0, 1].legend()

axarr[1, 1].scatter(myData4.Zr, myData4.Th, label='Epoch 3b')
axarr[1, 1].set_xlabel("Zr [ppm]")
axarr[1, 1].set_ylabel("Th [ppm]")
axarr[1, 1].set_xlim([100, 1000])
axarr[1, 1].set_ylim([0, 100])
axarr[1, 1].legend()

plt.tight_layout()
plt.show()
<matplotlib.figure.Figure at 0x1148b1208>

More examples and details are provided by the official documentation of the matplotlib library.

Python References

  • pandas library - pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language
  • pandas.DataFrame- two-dimensional tabular data structure with labeled axes
  • pandas.read_csv - read CSV file into DataFrame
  • pandas.read_excel - Read an Excel table into a pandas DataFrame
  • Jupyter Notebook - the Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
  • IPython Notebook - IPython is an environment for interactive and exploratory computing.
  • IPython.display - public API for display tools in IPython
  • matplotlib library - matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
  • matplotlib.pyplot - provides a MATLAB-like plotting framework.
  • matplotlib.pyplot.figure - creates a new figure.
  • matplotlib.pyplot.scatter - a scatter plot of y vs x with varying marker size and/or color.
  • matplotlib.pyplot.title - set a title of the current axes.
  • matplotlib.pyplot.xlabel - set the x-axis label of the current axes.
  • matplotlib.pyplot.ylabel - set the y-axis label of the current axes.
  • matplotlib.pyplot.show - display a figure.
  • matplotlib.pyplot.legend - the legend module defines the Legend class, which is responsible for drawing legends associated with axes and/or figures.
  • matplotlib.pyplot.subplots - create a figure and a set of subplots
  • matplotlib.pyplot.tight_layout - automatically adjust subplot parameters to give specified padding.
  • Compound statements - compound statements contain (groups of) other statements; they affect or control the execution of those other statements in some way. In general, compound statements span multiple lines, although in simple incarnations a whole compound statement may be contained in one line.

Bibliographic References and Further Readings

References

Furter Readings