pygeoprocessing package

Subpackages

Submodules

Module contents

pygeoprocessing: geoprocessing routines for GIS.

__init__ module imports all the geoprocessing functions into this namespace.

exception pygeoprocessing.ReclassificationMissingValuesError(missing_values, raster_path, value_map)[source]

Bases: Exception

Raised when a raster value is not a valid key to a dictionary.

msg
Type:

str

missing_values

that are not keys in the dictionary

Type:

list

pygeoprocessing.align_bbox(geotransform, bbox)[source]

Pad a bounding box so that it aligns with the grid of a given geotransform.

Ignores row and column rotation.

Parameters:
  • geotransform (list) – GDAL raster geotransform to align to

  • bbox (list) – bounding box in the form [minx, miny, maxx, maxy]

Returns:

padded bounding box in the form [minx, miny, maxx, maxy]

Raises:

ValueError if invalid geotransform or bounding box are provided

pygeoprocessing.array_equals_nodata(array, nodata)[source]

Check for the presence of nodata values in array.

The comparison supports numpy.nan and unset (None) nodata values.

Parameters:
  • array (numpy array) – the array to mask for nodata values.

  • nodata (number) – the nodata value to check for. Supports numpy.nan.

Returns:

A boolean numpy array with values of 1 where array is equal to nodata and 0 otherwise.

pygeoprocessing.calculate_slope(base_elevation_raster_path_band, target_slope_path, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))

Create a percent slope raster from DEM raster.

Base algorithm is from Zevenbergen & Thorne “Quantitative Analysis of Land Surface Topography” 1987 (https://doi.org/10.1002/esp.3290120107) although it has been modified to include the diagonal pixels by classic finite difference analysis.

For the following notation, we define each pixel’s DEM value by a letter with this spatial scheme:

a b c
d e f
g h i

Then the slope at e is defined at ([dz/dx]^2 + [dz/dy]^2)^0.5

Where:

[dz/dx] = ((c+2f+i)-(a+2d+g)/(8*x_cell_size)
[dz/dy] = ((g+2h+i)-(a+2b+c))/(8*y_cell_size)

In cases where a cell is nodata, we attempt to use the middle cell inline with the direction of differentiation (either in x or y direction). If no inline pixel is defined, we use e and multiply the difference by 2^0.5 to account for the diagonal projection.

Parameters:
  • base_elevation_raster_path_band (string) – a path/band tuple to a raster of height values. (path_to_raster, band_index)

  • target_slope_path (string) – path to target slope raster; will be a 32 bit float GeoTIFF of same size/projection as calculate slope with units of percent slope.

  • raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.choose_dtype(*raster_paths)[source]

Choose an appropriate dtype for an output derived from the given inputs.

Returns the dtype with the greatest size/precision among the inputs, so that information will not be lost.

Parameters:

*raster_paths – series of raster path strings

Returns:

numpy dtype

pygeoprocessing.choose_nodata(dtype)[source]

Choose an appropriate nodata value for data of a given dtype.

Parameters:

dtype (numpy.dtype) – data type for which to choose nodata

Returns:

number to use as nodata value

pygeoprocessing.distance_transform_edt(base_region_raster_path_band, target_distance_raster_path, sampling_distance=(1.0, 1.0), working_dir=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Calculate the euclidean distance transform on base raster.

Calculates the euclidean distance transform on the base raster in units of pixels multiplied by an optional scalar constant. The implementation is based off the algorithm described in: Meijster, Arnold, Jos BTM Roerdink, and Wim H. Hesselink. “A general algorithm for computing distance transforms in linear time.” Mathematical Morphology and its applications to image and signal processing. Springer, Boston, MA, 2002. 331-340.

The base mask raster represents the area to distance transform from as any pixel that is not 0 or nodata. It is computationally convenient to calculate the distance transform on the entire raster irrespective of nodata placement and thus produces a raster that will have distance transform values even in pixels that are nodata in the base.

Parameters:
  • base_region_raster_path_band (tuple) – a tuple including file path to a raster and the band index to define the base region pixels. Any pixel that is not 0 and nodata are considered to be part of the region.

  • target_distance_raster_path (string) – path to the target raster that is the exact euclidean distance transform from any pixel in the base raster that is not nodata and not 0. The units are in (pixel distance * sampling_distance).

  • sampling_distance (tuple/list) – an optional parameter used to scale the pixel distances when calculating the distance transform. Defaults to (1.0, 1.0). First element indicates the distance traveled in the x direction when changing a column index, and the second element in y when changing a row index. Both values must be > 0.

  • working_dir (string) – If not None, indicates where temporary files should be created during this run.

  • raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.iterblocks(raster_path_band, largest_block=65536, offset_only=False)[source]

Iterate across all the memory blocks in the input raster.

Result is a generator of block location information and numpy arrays.

This is especially useful when a single value needs to be derived from the pixel values in a raster, such as the sum total of all pixel values, or a sequence of unique raster values. In such cases, raster_local_op is overkill, since it writes out a raster.

As a generator, this can be combined multiple times with itertools.izip() to iterate ‘simultaneously’ over multiple rasters, though the user should be careful to do so only with prealigned rasters.

Parameters:
  • raster_path_band (tuple) – a path/band index tuple to indicate which raster band iterblocks should iterate over.

  • largest_block (int) – Attempts to iterate over raster blocks with this many elements. Useful in cases where the blocksize is relatively small, memory is available, and the function call overhead dominates the iteration. Defaults to 2**20. A value of anything less than the original blocksize of the raster will result in blocksizes equal to the original size.

  • offset_only (boolean) – defaults to False, if True iterblocks only returns offset dictionary and doesn’t read any binary data from the raster. This can be useful when iterating over writing to an output.

Yields:

If offset_only is false, on each iteration, a tuple containing a dict of block data and a 2-dimensional numpy array are yielded. The dict of block data has these attributes:

  • data['xoff'] - The X offset of the upper-left-hand corner of the block.

  • data['yoff'] - The Y offset of the upper-left-hand corner of the block.

  • data['win_xsize'] - The width of the block.

  • data['win_ysize'] - The height of the block.

If offset_only is True, the function returns only the block offset data and does not attempt to read binary data from the raster.

pygeoprocessing.merge_bounding_box_list(bounding_box_list, bounding_box_mode)[source]

Create a single bounding box by union or intersection of the list.

Parameters:
  • bounding_box_list (sequence) – a sequence of bounding box coordinates in the order [minx, miny, maxx, maxy].

  • mode (string) – either 'union' or 'intersection' for the corresponding reduction mode.

Returns:

A four tuple bounding box that is the union or intersection of the input bounding boxes.

Raises:

ValueError – if the bounding boxes in bounding_box_list do not intersect if the bounding_box_mode is ‘intersection’.

pygeoprocessing.raster_band_percentile(base_raster_path_band, working_sort_directory, percentile_list, heap_buffer_size=268435456, ffi_buffer_size=1024)

Calculate percentiles of a raster band.

Parameters:
  • base_raster_path_band (tuple) – raster path band tuple to a raster that is of any integer or real type.

  • working_sort_directory (str) – path to a directory that does not exist or is empty. This directory will be used to create heapfiles with sizes no larger than heap_buffer_size which are written in the of the pattern N.dat where N is in the numbering 0, 1, 2, … up to the number of files necessary to handle the raster.

  • percentile_list (list) – sorted list of percentiles to report must contain values in the range [0, 100].

  • heap_buffer_size (int) – defines approximately how many elements to hold in a single heap file. This is proportional to the amount of maximum memory to use when storing elements before a sort and write to disk.

  • ffi_buffer_size (int) – defines how many elements will be stored per heap file buffer for iteration.

Returns:

A list of len(percentile_list) elements long containing the percentile values (ranging from [0, 100]) in base_raster_path_band where the interpolation scheme is “higher” (i.e. any percentile splits will select the next element higher than the percentile cutoff).

pygeoprocessing.raster_map(op, rasters, target_path, target_nodata=None, target_dtype=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Apply a pixelwise function to a series of raster inputs.

The output raster will have nodata where any input raster has nodata. Raster inputs are split into aligned blocks, and the function is applied individually to each stack of blocks (as numpy arrays).

Parameters:
  • op (function) – Function to apply to the inputs. It should accept a number of arguments equal to the length of *inputs. It should return a numpy array with the same shape as its array input(s).

  • rasters (list[str]) – Paths to rasters to input to op, in the order that they will be passed to op. All rasters should be aligned and have the same dimensions.

  • target_path (str) – path to write out the output raster.

  • target_nodata (number) – Nodata value to use for the output raster. Optional. If not provided, a suitable nodata value will be chosen.

  • target_dtype (numpy.dtype) – dtype to use for the output. Optional. If not provided, a suitable dtype will be chosen.

  • raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS. If the target_dtype is int8, the PIXELTYPE=SIGNEDBYTE option will be added to the creation options tuple if it is not already there.

Returns:

None

pygeoprocessing.raster_reduce(function, raster_path_band, initializer, mask_nodata=True, largest_block=65536)[source]

Cumulatively apply a reducing function to each block of a raster.

This effectively reduces the entire raster to a single value, but it works by blocks to be memory-efficient.

The function signature should be function(aggregator, block), where aggregator is the aggregated value so far, and block is a flattened numpy array containing the data from the block to reduce next.

function is called once on each block. On the first function call, aggregator is initialized with initializer. The return value from each function call is passed in as the aggregator argument to the subsequent function call. When all blocks have been reduced, the return value of the final function call is returned.

Example

Calculate the sum of all values in a raster:

raster_reduce(lambda total, block: total + numpy.sum(block),
              (raster_path, 1), 0)

Calculate a histogram of all values in a raster:

def add_to_histogram(histogram, block):
    return histogram + numpy.histogram(block, bins=10)[0]

raster_reduce(add_to_histogram, (raster_path, 1), numpy.zeros(10))

Calculate the sum of all values in a raster, excluding nodata:

nodata = pygeoprocessing.get_raster_info(raster_path)['nodata'][0]
def sum_excluding_nodata(total, block):
    return total + numpy.sum(block[block != nodata])

raster_reduce(sum_excluding_nodata, (raster_path, 1), 0)
Parameters:
  • function (func) – function to apply to each raster block

  • raster_path_band (tuple) – (path, band) tuple of the raster to reduce

  • initializer (obj) – value to initialize the aggregator for the first function call

  • mask_nodata (bool) – if True, mask out nodata before aggregating. A flattened array of non-nodata pixels from each block is passed to the function. if False, each block is passed to the function without masking.

  • largest_block (int) – largest block parameter to pass to iterblocks

Returns:

aggregate value, the final value returned from function