NetCDF Tutorial

pix
Go back to home page

This page provides some lessons in reading, processing and writing NetCDF files, using the Python module Scientific. Python's matplotlib, and its toolkit basemap, are used for plotting.
last modified: 04:17 PM CDT, Fri 14 Sep 2007


Why NetCDF?

Suppose your research has produced a time series of data, which you wanted to make accessible to other people. Imagine your time series is CO2 concentration in the atmosphere, which could be plotted as:

One obvious way to distribute the data would be as an ASCII file, with time in the first column and concentration in the second column. But what format should the time be written in? Should the ASCII file be in Unix or DOS format? See
Unix and DOS issues or the Wikeipedia entry Newline. Should the time code be JUN1995 or 061995, or 199506 or something else? Do you expect people to be able to plot the data immediately, or do you expect the user to first calculate a number, truly linear in time, from the date stamp that you give? What if the instrument was "down" during some months, what symbol should you use to denote "missing data". Are you going to put "comments" in the file describing what the data is? Or will there be a separate "readme" file that describes the data?

Suppose, rather than 50 years of monthly averages of CO2 concentration at a point, you need to share many years of global analysis of winds. What would be the purpose of sending ASCII data if no human could be expected to scan more than a tiny portion of the data in a text editor or text viewer? In deference to the computing resources of the potential users of your data, the data should be stored as compactly as possible, using the minimum number of bits to represent the precision of the number. Should the data be stored in the standard binary format that can be read into a C or Fortran program? Would the standard 32 bit floating point number be a waste of space, since the data has only 3 significant digits? Should the binary data be big endian or little endian format? Enter netCDF to solve these dilemmas. What follows is my own attempt at a NetCDF tutorial, using Python.


A simple NetCDF file, and a simple plot

Grab and untar nctask.tar into your Gentry home directory. It contains all the files you need for this tutorial and for your task.

In your nctask directory, you will find a small file keeling.cdf. (If you are curious about where it came from, visit IRI/LDEO Climate Data Library. Go to "Keeling" and then to "Mauna_Loa", "Data Downloads", "Files". Grab the NetCDF file.) Apparently, netCDF files can use either .nc or .cdf as an extension. This is confusing because there is another data format called CDF that is incompatible with netCDF. Gee, I wonder what extension CDF uses to designate its files?

If your computer has netCDF installed, as does Gentry, your computer will have a few tame pieces of software that allow C and Fortran programs to read and write netCDF files, as well as some short utility programs ncdump and ncgen.

To see a summary of what is in a netcdf file:

ncdump -h keeling.cdf
keeling.cdf is such a small file (and not really a worthy candidate for all the fuss of storing it as a NetCDF file), you can dump all the data to your screen also:
ncdump keeling.cdf
(cat keeling.cdf is probably NOT a good way to see the data.) To learn more about ncdump, type man ncdump.

Actually, ncdump is translating the binary netcdf file to an ascii file written in "common data language". It is possible to make modest edits in short NetCDF files by ncdump any.nc > mydump, then edit mydump, then ncgen -o mynew.nc mydump.

Now let's read keeling.cdf into a Python program, and plot out the data. Just type:

simple.py
You should be familar with the use of matplotlib for plotting. The use of Scientific is probably unfamiliar to you. You may be able to learn most of what you need to know about Scientific from the examples here, and from a Python interpreter, e.g.:
>>>import Scientific.Statistics as ss
>>>help(ss)
But for Scientific.IO.NetCDF you may need to read the manual.

BTW, the pytables module looks very promising for both HDF, NetCDF, and future versions of NetCDF. Version 1.3.3 has the ability to write HDF files that have the look and feel of NetCDF files when used solely with pytables; you will not be able to read and write standard NetCDF files, unless you also have Scientific installed. For our needs, it is better to simply use Scientific directly.


Multidimensional data in a NetCDF file

The files palmerna.cdf and climap.nc contain some multidimensional data. The data is rather interesting, independent of its use as examples in information technology.

Get a summary of the data:

ncdump -h plamerna.cdf
ncdump -h climap.cdf
In your nctask directory, you will find a general purpose plotting progam for NetCDF data that can be mapped. Try this:
ncplotter.py climap.nc


enter X variable=>X
shape of X:  (180,)

enter Y variable=>Y
shape of Y:  (91,)

enter variable to plot=>LGM_aug_sst
shape of LGM_aug_sst:  (91, 180)
 using specified missing value = [-99999.,]

LGM_aug_sst has maximum  29.2000007629   and minimum: -1.10000002384
enter lower bound on color scale =>0
enter upper bound on color scale =>30.
using clim= [0.0, 30.0]
Or prestore your responses in a file, and do:
ncplotter.py climap.nc < myresponses
You should see this:

Here is another plot:
ncplotter.py -p lam -r palmerna.cdf
enter X variable=>X
enter Y variable=>Y
enter variable to plot=>PDSI
enter which record to plot=>440
enter lower bound on color scale =>-10
enter upper bound on color scale =>10
You should see the Palmer Drought Severity Index in August 1956, a time of extreme drought in Oklahoma:

You don't need to know much about drought to complete the task, but you may want to visit: How did I know rec=440 was the time of the minimum PSDI in Oklahoma? I ran hov.py ok, which contains several methods of data exploration within palmerna.cdf. Two figures are produced. The first is a Hovmoeller-like plot of PDSI (meaning a plot in the X-T plane) at the latitude of Oklahoma:

The second is a times series of the PSDI in Oklahoma and New York, and a calculation of the correlation between the two time series, which continues the investigation of the size of droughts:
The investigation of the size of droughts continues with corrmap.py, which produces corrmap.nc. A plot of the correlation between the PDSI in Oklahoma, with other areas in North America, is plotted with:
corrmap.py
ncplotter.py -p lam corrmap.nc
enter X variable=>X
enter Y variable=>Y
enter variable to plot=>corr
enter lower bound on color scale =>-1
enter upper bound on color scale =>1

Your task is to extend corrmap.py to write out the mean of PDSI and the variance of PDSI into the NetCDF file, and then to plot those quantities, as was done above for the correlation. See the comments for TASK within corrmap.py.

pix
Go back to home page

pix
Move to top of page