Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

This article explains how to extract data from a 3 dimension NetCDF file using different options and save the output as a CSV (comma separated variables) file.

You are expected to have installed python 2.7 or later, and the CDS API on a Linux machine before you continue.

First option : Python Script

The first option is to use a python script (below). The script allows you to covert data from NetCDF in two different ways, as explained in the workflow below:

  • Retrieve data with the CDS API and store as a netCDF4 file in the working directory.
  • Extract the variable from the NetCDF file and get the dimensions (i.e. time, latitudes and longitudes)
  • Extract each time as a 2D pandas DataFrame and write it to the CSV file
  • Write the  data as a table with 4 columns: time, latitude, longitude, value


Code Block
languagepy
titleTwo methods to dump to CSV
linenumberstrue
collapsetrue
#!/usr/bin/python3

import cdsapi
import netCDF4
from netCDF4 import num2date
import numpy as np
import os
import pandas as pd

# Retrieve data and store as netCDF4 file
c = cdsapi.Client()
file_location = '/tmp/download.nc'
c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type':'reanalysis',
        'variable':'2m_temperature',  # 't2m'
        'year':'2019',
        'month':'06',
        'day':[
            '24','25'
        ],
        'time':[
            '00:00','06:00','12:00',
            '18:00'
        ],
        'format':'netcdf'
    },
    file_location)

# Open netCDF4 file
f = netCDF4.Dataset(file_location)

# Extract variable
t2m = f.variables['t2m']

# Get dimensions assuming 3D: time, latitude, longitude
time_dim, lat_dim, lon_dim = t2m.get_dims()
time_var = f.variables[time_dim.name]
times = num2date(time_var[:], time_var.units)
latitudes = f.variables[lat_dim.name][:]
longitudes = f.variables[lon_dim.name][:]

output_dir = '/tmp/data'

# =============================== METHOD 1 ============================
# Extract each time as a 2D pandas DataFrame and write it to CSV
# =====================================================================
os.makedirs(output_dir, exist_ok=True)
for i, t in enumerate(times):
    filename = os.path.join(output_dir, f'{t.isoformat()}.csv')
    print(f'Writing time {t} to {filename}')
    df = pd.DataFrame(t2m[i, :, :], index=latitudes, columns=longitudes)
    df.to_csv(filename)
print('Done')

# =============================== METHOD 2 ============================
# Write data as a table with 4 columns: time, latitude, longitude, value
# =====================================================================
filename = os.path.join(output_dir, 'table.csv')
print(f'Writing data in tabular form to {filename} (this may take some time)...')
times_grid, latitudes_grid, longitudes_grid = [
    x.flatten() for x in np.meshgrid(times, latitudes, longitudes, indexing='ij')]
df = pd.DataFrame({
    'time': [t.isoformat() for t in times_grid],
    'latitude': latitudes_grid,
    'longitude': longitudes_grid,
    't2m': t2m[:].flatten()})
df.to_csv(filename, index=False)
print('Done')

Second option : Panoply

A second option is to convert the data using the NASA 'Panoply' software. User can find the option under File → Export data → As CSV. The data are saved in the file maintaining the structure of the lot/lan matrix, but different times are divided by an empty row.

Expand
titlePanoply export CSV

Third option : Windows users

A third option to convert the data from NetCDF to CSV, for Windows users, is download and install netcdf4excel.  The plug-in opens directly NetCDF files in Microsoft Excel maintaining conventions for the NetCDF variables. Please see the link for more details: http://netcdf4excel.github.io/.

Other solutions

For Unix users, there are others options provided by some common NetCDF software packages. Please the links for more details: