pygeoprocessing.geoprocessing module

A collection of raster and vector algorithms and utilities.

exception pygeoprocessing.geoprocessing.ReclassificationMissingValuesError(missing_values, raster_path, value_map)[source]

Bases: Exception

Raised when a raster value is not a valid key to a dictionary.

msg

Type:: str

missing_values

that are not keys in the dictionary

Type:: list

class pygeoprocessing.geoprocessing.TimedLoggingAdapter(interval_s=5.0)[source]

Bases: LoggerAdapter

A logging adapter to restrict logging based on a timer.

The objective is to have a logging.LOGGER-like object that can be called multiple times in rapid successtion, but with log messages only propagating every X seconds.

This object is helpful for creating consistency in logging callbacks and is derived from the python stdlib logging.LoggerAdapter.

log(level, msg, *args, **kwargs)[source]

Log a LogRecord.

Parameters:

level (int) – The logging level.
msg (str) – The log message.
args (list) – The user-defined positional arguments for the log message.
kwargs (dict) – The user-defined keyword arguments for the log message.

Returns:

None.

pygeoprocessing.geoprocessing.align_and_resize_raster_stack(base_raster_path_list, target_raster_path_list, resample_method_list, target_pixel_size, bounding_box_mode, base_vector_path_list=None, raster_align_index=None, base_projection_wkt_list=None, target_projection_wkt=None, mask_options=None, vector_mask_options=None, gdal_warp_options=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')), osr_axis_mapping_strategy=0, working_dir=None)[source]

Generate rasters from a base such that they align geospatially.

This function resizes base rasters that are in the same geospatial projection such that the result is an aligned stack of rasters that have the same cell size, dimensions, and bounding box. This is achieved by clipping or resizing the rasters to intersected, unioned, or equivocated bounding boxes of all the raster and vector input.

Parameters:

base_raster_path_list (sequence) – a sequence of base raster paths that will be transformed and will be used to determine the target bounding box.
target_raster_path_list (sequence) – a sequence of raster paths that will be created to one-to-one map with base_raster_path_list as aligned versions of those original rasters. If there are duplicate paths in this list, the function will raise a ValueError.
resample_method_list (sequence) – a sequence of resampling methods which one to one map each path in base_raster_path_list during resizing. Each element must be one of “near|bilinear|cubic|cubicspline|lanczos|mode”.
target_pixel_size (list/tuple) – the target raster’s x and y pixel size example: (30, -30).
bounding_box_mode (string) – one of “union”, “intersection”, or a sequence of floats of the form [minx, miny, maxx, maxy] in the target projection coordinate system. Depending on the value, output extents are defined as the union, intersection, or the explicit bounding box.
base_vector_path_list (sequence) – a sequence of base vector paths whose bounding boxes will be used to determine the final bounding box of the raster stack if mode is ‘union’ or ‘intersection’. If mode is ‘bb=[…]’ then these vectors are not used in any calculation.
raster_align_index (int) – indicates the index of a raster in base_raster_path_list that the target rasters’ bounding boxes pixels should align with. This feature allows rasters whose raster dimensions are the same, but bounding boxes slightly shifted less than a pixel size to align with a desired grid layout. If None then the bounding box of the target rasters is calculated as the precise intersection, union, or bounding box.
base_projection_wkt_list (sequence) – if not None, this is a sequence of base projections of the rasters in base_raster_path_list. If a value is None, the projection is read directly from the raster. Use this argument if there are rasters with no projection defined, but the projections are known.
target_projection_wkt (string) – if not None, this is the desired projection of all target rasters in Well Known Text format, and target rasters will be warped to this projection. If None, the base SRS will be passed to the target.
mask_options (dict) –
optional, if not None, this is a dictionary of options to use an existing vector’s geometry to mask out pixels in the target raster that do not overlap the vector’s geometry. Keys to this dictionary are:
- 'mask_vector_path' (str): path to the mask vector file. This vector will be automatically projected to the target projection if its base coordinate system does not match the target.
- 'mask_layer_name' (str): the layer name to use for masking. If this key is not in the dictionary the default is to use the layer at index 0.
- 'mask_vector_where_filter' (str): an SQL WHERE string. This will be used to filter the geometry in the mask. Ex: 'id > 10' would use all features whose field value of ‘id’ is > 10.
- 'mask_raster_path' (str): Optional. the string path to where the mask raster should be written on disk. If not provided, a temporary file will be created within working_dir.
vector_mask_options (dict) – optional. Alias for mask_options. This parameter is deprecated and will be removed in a future version of pygeoprocessing.
gdal_warp_options (sequence) – if present, the contents of this list are passed to the warpOptions parameter of gdal.Warp. See the GDAL Warp documentation for valid options.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.
osr_axis_mapping_strategy (int) – OSR axis mapping strategy for SpatialReference objects. Defaults to geoprocessing.DEFAULT_OSR_AXIS_MAPPING_STRATEGY. This parameter should not be changed unless you know what you are doing.
working_dir=None (str) – if present, the path to a directory within which a temporary directory will be created. If not provided, the new directory will be created within the system’s temporary directory.

Returns:

None

Raises:

ValueError – If any combination of the raw bounding boxes, raster bounding boxes, vector bounding boxes, and/or vector_mask bounding box does not overlap to produce a valid target.
ValueError – If any of the input or target lists are of different lengths.
ValueError – If there are duplicate paths on the target list which would risk corrupted output.
ValueError – If some combination of base, target, and embedded source reference systems results in an ambiguous target coordinate system.
ValueError – If mask_options is not None but the mask_vector_path is undefined or doesn’t point to a valid file.
ValueError – If pixel_size is not a 2 element sequence of numbers.

pygeoprocessing.geoprocessing.align_bbox(geotransform, bbox)[source]

Pad a bounding box so that it aligns with the grid of a given geotransform.

Ignores row and column rotation.

Parameters:

geotransform (list) – GDAL raster geotransform to align to
bbox (list) – bounding box in the form [minx, miny, maxx, maxy]

Returns:

padded bounding box in the form [minx, miny, maxx, maxy]

Raises:

ValueError if invalid geotransform or bounding box are provided –

pygeoprocessing.geoprocessing.array_equals_nodata(array, nodata)[source]

Check for the presence of nodata values in array.

The comparison supports numpy.nan and unset (None) nodata values.

Parameters:

array (numpy array) – the array to mask for nodata values.
nodata (number) – the nodata value to check for. Supports numpy.nan.

Returns:

A boolean numpy array with values of 1 where array is equal to nodata and 0 otherwise.

pygeoprocessing.geoprocessing.build_overviews(raster_path, internal=False, resample_method='near', overwrite=False, levels='auto')[source]

Build overviews for a raster dataset.

Parameters:

raster_path (str) – A path to a raster on disk for which overviews should be built.
internal=False (bool) – Whether to modify the raster when building overviews. In GeoTiffs, this builds internal overviews when internal=True, and external overviews when internal=False.
resample_method='near' (str) – The resample method to use when building overviews. Must be a valid resampling method for gdal.GDALDataset.BuildOverviews, one of ‘rms | mode | sum | q1 | near | q3 | average | cubicspline | bilinear | max | med | min | cubic | lanczos’.
overwrite=False (bool) – Whether to overwrite existing overviews, if any exist.
levels='auto' (sequence) – A sequence of integer overview levels. If 'auto', overview levels will be determined by using factors of 2 until the overview’s x and y dimensions are both less than 256.

Example

Generate overviews, regardless of whether overviews already exist for the raster, letting the function determine the levels of overviews to generate:

build_overviews(raster_path)

Generate overviews for 4 levels, at 1/2, 1/4, 1/8 and 1/16 the resolution:

build_overviews(raster_path, levels=[2, 4, 8, 16])

Returns:: None

pygeoprocessing.geoprocessing.calculate_disjoint_polygon_set(vector_path, layer_id=0, bounding_box=None, geometries_may_touch=False)[source]

Create a sequence of sets of polygons that don’t overlap.

Determining the minimal number of those sets is an np-complete problem so this is an approximation that builds up sets of maximal subsets.

Parameters:

vector_path (string) – a path to an OGR vector.
layer_id (str/int) – name or index of underlying layer in vector_path to calculate disjoint set. Defaults to 0.
bounding_box (sequence) – sequence of floats representing a bounding box to filter any polygons by. If a feature in vector_path does not intersect this bounding box it will not be considered in the disjoint calculation. Coordinates are in the order [minx, miny, maxx, maxy].
geometries_may_touch=False (bool) – If True, geometries in a subset are allowed to have touching boundaries, but are not allowed to have intersecting interiors. If False (the default), no geometries in a subset may intersect in any way.

Returns:

sequence of sets of FIDs from vector_path

Return type:

subset_list (sequence)

pygeoprocessing.geoprocessing.choose_dtype(*raster_paths)[source]

Choose an appropriate dtype for an output derived from the given inputs.

Returns the dtype with the greatest size/precision among the inputs, so that information will not be lost.

Parameters:: *raster_paths – series of raster path strings
Returns:: numpy dtype

pygeoprocessing.geoprocessing.choose_nodata(dtype)[source]

Choose an appropriate nodata value for data of a given dtype.

Parameters:: dtype (numpy.dtype) – data type for which to choose nodata
Returns:: number to use as nodata value

pygeoprocessing.geoprocessing.convolve_2d(signal_path_band, kernel_path_band, target_path, ignore_nodata_and_edges=False, mask_nodata=True, normalize_kernel=False, target_datatype=7, target_nodata=None, working_dir=None, set_tol_to_zero=1e-08, max_timeout=60.0, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Convolve 2D kernel over 2D signal.

Convolves the raster in kernel_path_band over signal_path_band. Nodata values are treated as 0.0 during the convolution and masked to nodata for the output result where signal_path has nodata.

Note with default values, boundary effects can be seen in the result where the kernel would hang off the edge of the raster or in regions with nodata pixels. The function would treat these areas as values with “0.0” by default thus pulling the total convolution down in these areas. This is similar to setting mode='same' in Numpy’s convolve function: https://numpy.org/doc/stable/reference/generated/numpy.convolve.html

This boundary effect can be avoided by setting ignore_nodata_and_edges=True which normalizes the target result by dynamically accounting for the number of valid signal pixels the kernel overlapped during the convolution step.

Parameters:

signal_path_band (tuple) – a 2 tuple of the form (filepath to signal raster, band index).
kernel_path_band (tuple) – a 2 tuple of the form (filepath to kernel raster, band index), all pixel values should be valid – output is not well defined if the kernel raster has nodata values. To create a kernel raster, see the documentation and helper functions available in the pygeoprocessing kernels module.
target_path (string) – filepath to target raster that’s the convolution of signal with kernel. Output will be a single band raster of same size and projection as signal_path_band. Any nodata pixels that align with signal_path_band will be set to nodata.
ignore_nodata_and_edges (boolean) – If true, any pixels that are equal to signal_path_band’s nodata value or signal pixels where the kernel extends beyond the edge of the raster are not included when averaging the convolution filter. This has the effect of “spreading” the result as though nodata and edges beyond the bounds of the raster are 0s. If set to false this tends to “pull” the signal away from nodata holes or raster edges. Set this value to True to avoid distortions signal values near edges for large integrating kernels. It can be useful to set this value to True to fill nodata holes through distance weighted averaging. In this case mask_nodata must be set to False so the result does not mask out these areas which are filled in. When using this technique be careful of cases where the kernel does not extend over any areas except nodata holes, in this case the resulting values in these areas will be nonsensical numbers, perhaps numerical infinity or NaNs.
normalize_kernel (boolean) – If true, the result is divided by the sum of the kernel.
mask_nodata (boolean) – If true, target_path raster’s output is nodata where signal_path_band’s pixels were nodata. Note that setting ignore_nodata_and_edges to True while setting mask_nodata to False can allow for a technique involving distance weighted averaging to define areas that would otherwise be nodata. Be careful in cases where the kernel does not extend over any valid non-nodata area since the result can be numerical infinity or NaNs.
target_datatype (GDAL type) – a GDAL raster type to set the output raster type to, as well as the type to calculate the convolution in. Defaults to GDT_Float64. Note signed byte is not supported.
target_nodata (int/float) – nodata value to set on output raster. If target_datatype is not gdal.GDT_Float64, this value must be set. Otherwise defaults to the minimum value of a float32.
raster_creation_options (sequence) – an argument list that will be passed to the GTiff driver for creating target_path. Useful for blocksizes, compression, and more.
working_dir (string) – If not None, indicates where temporary files should be created during this run.
set_tol_to_zero (float) – any value within +- this from 0.0 will get set to 0.0. This is to handle numerical roundoff errors that sometimes result in “numerical zero”, such as -1.782e-18 that cannot be tolerated by users of this function. If None no adjustment will be done to output values.
max_timeout (float) – maximum amount of time to wait for worker thread to terminate.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

Raises:

ValueError – if ignore_nodata_and_edges is True and mask_nodata is False.
ValueError – if signal_path_band or kernel_path_band is a row based blocksize which would result in slow runtimes due to gdal cache thrashing.

pygeoprocessing.geoprocessing.create_raster_from_bounding_box(target_bounding_box, target_raster_path, target_pixel_size, target_pixel_type, target_srs_wkt, target_nodata, fill_value=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Create a raster from a given bounding box.

Parameters:

target_bounding_box (tuple) – a 4-element iterable of (minx, miny, maxx, maxy) in projected units matching the SRS of target_srs_wkt.
target_raster_path (string) – The path to where the new raster should be created on disk.
target_pixel_size (tuple) – A 2-element tuple of the (x, y) pixel size of the target raster. Elements are in units of the target SRS.
target_pixel_type (int) – The GDAL GDT_* type of the target raster.
target_srs_wkt (string) – The SRS of the target raster, in Well-Known Text format.
target_nodata (float) – The nodata value of the target raster, or None if no nodata value is to be set.
fill_value=None (number) – If provided, the value that the target raster should be filled with.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.create_raster_from_vector_extents(base_vector_path, target_raster_path, target_pixel_size, target_pixel_type, target_nodata, fill_value=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Create a blank raster based on a vector file extent.

Parameters:

base_vector_path (string) – path to vector shapefile to base the bounding box for the target raster.
target_raster_path (string) – path to location of generated geotiff; the upper left hand corner of this raster will be aligned with the bounding box of the source vector and the extent will be exactly equal or contained the source vector’s bounding box depending on whether the pixel size divides evenly into the source bounding box; if not coordinates will be rounded up to contain the original extent.
target_pixel_size (list/tuple) –
the x/y pixel size as a sequence Example:
```
[30.0, -30.0]
```
target_pixel_type (int) – gdal GDT pixel type of target raster
target_nodata (numeric) – target nodata value. Can be None if no nodata value is needed.
fill_value (int/float) – value to fill in the target raster; no fill if value is None
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.distance_transform_edt(base_region_raster_path_band, target_distance_raster_path, sampling_distance=(1.0, 1.0), working_dir=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Calculate the euclidean distance transform on base raster.

Calculates the euclidean distance transform on the base raster in units of pixels multiplied by an optional scalar constant. The implementation is based off the algorithm described in: Meijster, Arnold, Jos BTM Roerdink, and Wim H. Hesselink. “A general algorithm for computing distance transforms in linear time.” Mathematical Morphology and its applications to image and signal processing. Springer, Boston, MA, 2002. 331-340.

The base mask raster represents the area to distance transform from as any pixel that is not 0 or nodata. It is computationally convenient to calculate the distance transform on the entire raster irrespective of nodata placement and thus produces a raster that will have distance transform values even in pixels that are nodata in the base.

Parameters:

base_region_raster_path_band (tuple) – a tuple including file path to a raster and the band index to define the base region pixels. Any pixel that is not 0 and nodata are considered to be part of the region.
target_distance_raster_path (string) – path to the target raster that is the exact euclidean distance transform from any pixel in the base raster that is not nodata and not 0. The units are in (pixel distance * sampling_distance).
sampling_distance (tuple/list) – an optional parameter used to scale the pixel distances when calculating the distance transform. Defaults to (1.0, 1.0). First element indicates the distance traveled in the x direction when changing a column index, and the second element in y when changing a row index. Both values must be > 0.
working_dir (string) – If not None, indicates where temporary files should be created during this run.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.get_gis_type(path)[source]

Calculate the GIS type of the file located at path.

Parameters:: path (str) – path to a file on disk or network.
Raises:: ValueError – if path is not a file or cannot be opened as a gdal.OF_RASTER or gdal.OF_VECTOR.
Returns:: A bitwise OR of all GIS types that PyGeoprocessing models, currently this is pygeoprocessing.RASTER_TYPE, or pygeoprocessing.VECTOR_TYPE.

pygeoprocessing.geoprocessing.get_raster_info(raster_path)[source]

Get information about a GDAL raster (dataset).

Parameters:

raster_path (String) – a path to a GDAL raster.

Raises:

ValueError – if raster_path is not a file or cannot be opened as a gdal.OF_RASTER.

Returns:

a dictionary with the properties stored under relevant keys.

'pixel_size' (tuple): (pixel x-size, pixel y-size) from geotransform.
'raster_size' (tuple): number of raster pixels in (x, y) direction.
'nodata' (sequence): a sequence of the nodata values in the bands of the raster in the same order as increasing band index.
'n_bands' (int): number of bands in the raster.
'geotransform' (tuple): a 6-tuple representing the geotransform of (x orign, x-increase, xy-increase, y origin, yx-increase, y-increase).
'datatype' (int): An instance of an enumerated gdal.GDT_* int that represents the datatype of the raster.
'projection_wkt' (string): projection of the raster in Well Known Text.
'bounding_box' (sequence): sequence of floats representing the bounding box in projected coordinates in the order [minx, miny, maxx, maxy]
'block_size' (tuple): underlying x/y raster block size for efficient reading.
'numpy_type' (numpy type): this is the equivalent numpy datatype for the raster bands including signed bytes.
'overviews' (sequence): A list of (x, y) tuples for the number of pixels in the width and height of each overview level of the raster.
'file_list' (sequence): A list of files that make up this raster.

Return type:

raster_properties (dictionary)

pygeoprocessing.geoprocessing.get_vector_info(vector_path, layer_id=0)[source]

Get information about an GDAL vector.

Parameters:

vector_path (str) – a path to a GDAL vector.
layer_id (str/int) – name or index of underlying layer to analyze. Defaults to 0.

Raises:

ValueError if vector_path does not exist on disk or cannot be –
opened as a gdal.OF_VECTOR. –

Returns:

a dictionary with the following key-value pairs:

'projection_wkt' (string): projection of the vector in Well Known Text.
'bounding_box' (sequence): sequence of floats representing the bounding box in projected coordinates in the order [minx, miny, maxx, maxy].
'file_list' (sequence): sequence of string paths to the files that make up this vector.

Return type:

raster_properties (dictionary)

pygeoprocessing.geoprocessing.interpolate_points(base_vector_path, vector_attribute_field, target_raster_path_band, interpolation_mode)[source]

Interpolate point values onto an existing raster.

Parameters:

base_vector_path (string) – path to a shapefile that contains point vector layers.
vector_attribute_field (field) – a string in the vector referenced at base_vector_path that refers to a numeric value in the vector’s attribute table. This is the value that will be interpolated across the raster.
target_raster_path_band (tuple) – a path/band number tuple to an existing raster which likely intersects or is nearby the source vector. The band in this raster will take on the interpolated numerical values provided at each point.
interpolation_mode (string) – the interpolation method to use for scipy.interpolate.griddata, one of ‘linear’, near’, or ‘cubic’.

Returns:

None

pygeoprocessing.geoprocessing.iterblocks(raster_path_band, largest_block=65536, offset_only=False)[source]

Iterate across all the memory blocks in the input raster.

Result is a generator of block location information and numpy arrays.

This is especially useful when a single value needs to be derived from the pixel values in a raster, such as the sum total of all pixel values, or a sequence of unique raster values. In such cases, raster_local_op is overkill, since it writes out a raster.

As a generator, this can be combined multiple times with itertools.izip() to iterate ‘simultaneously’ over multiple rasters, though the user should be careful to do so only with prealigned rasters.

Parameters:

raster_path_band (tuple) – a path/band index tuple to indicate which raster band iterblocks should iterate over.
largest_block (int) – Attempts to iterate over raster blocks with this many elements. Useful in cases where the blocksize is relatively small, memory is available, and the function call overhead dominates the iteration. Defaults to 2**20. A value of anything less than the original blocksize of the raster will result in blocksizes equal to the original size.
offset_only (boolean) – defaults to False, if True iterblocks only returns offset dictionary and doesn’t read any binary data from the raster. This can be useful when iterating over writing to an output.

Yields:

If offset_only is false, on each iteration, a tuple containing a dict of block data and a 2-dimensional numpy array are yielded. The dict of block data has these attributes:

data['xoff'] - The X offset of the upper-left-hand corner of the block.
data['yoff'] - The Y offset of the upper-left-hand corner of the block.
data['win_xsize'] - The width of the block.
data['win_ysize'] - The height of the block.

If offset_only is True, the function returns only the block offset data and does not attempt to read binary data from the raster.

pygeoprocessing.geoprocessing.mask_raster(base_raster_path_band, mask_vector_path, target_mask_raster_path, mask_layer_id=0, target_mask_value=None, working_dir=None, all_touched=False, where_clause=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Mask a raster band with a given vector.

Parameters:

base_raster_path_band (tuple) – a (path, band number) tuple indicating the data to mask.
mask_vector_path (path) – path to a vector that will be used to mask anything outside of the polygon that overlaps with base_raster_path_band to target_mask_value if defined or else base_raster_path_band’s nodata value.
target_mask_raster_path (str) – path to desired target raster that is a copy of base_raster_path_band except any pixels that do not intersect with mask_vector_path are set to target_mask_value or base_raster_path_band’s nodata value if target_mask_value is None.
mask_layer_id (str/int) – an index or name to identify the mask geometry layer in mask_vector_path, default is 0.
target_mask_value (numeric) – If not None, this value is written to any pixel in base_raster_path_band that does not intersect with mask_vector_path. Otherwise the nodata value of base_raster_path_band is used.
working_dir (str) – this is a path to a directory that can be used to hold temporary files required to complete this operation.
all_touched (bool) – if False, a pixel is only masked if its centroid intersects with the mask. If True a pixel is masked if any point of the pixel intersects the polygon mask.
where_clause (str) – (optional) if not None, it is an SQL compatible where clause that can be used to filter the features that are used to mask the base raster.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.merge_bounding_box_list(bounding_box_list, bounding_box_mode)[source]

Create a single bounding box by union or intersection of the list.

Parameters:

bounding_box_list (sequence) – a sequence of bounding box coordinates in the order [minx, miny, maxx, maxy].
mode (string) – either 'union' or 'intersection' for the corresponding reduction mode.

Returns:

A four tuple bounding box that is the union or intersection of the input bounding boxes.

Raises:

ValueError – if the bounding boxes in bounding_box_list do not intersect if the bounding_box_mode is ‘intersection’.

pygeoprocessing.geoprocessing.new_raster_from_base(base_path, target_path, datatype, band_nodata_list, fill_value_list=None, n_rows=None, n_cols=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Create new raster by coping spatial reference/geotransform of base.

A convenience function to simplify the creation of a new raster from the basis of an existing one. Depending on the input mode, one can create a new raster of the same dimensions, geotransform, and georeference as the base. Other options are provided to change the raster dimensions, number of bands, nodata values, data type, and core raster creation options.

Parameters:

base_path (string) – path to existing raster.
target_path (string) – path to desired target raster.
datatype – the pixel datatype of the output raster, for example gdal.GDT_Float32. See the following header file for supported pixel types: http://www.gdal.org/gdal_8h.html#22e22ce0a55036a96f652765793fb7a4
band_nodata_list (sequence) – list of nodata values, one for each band, to set on target raster. If value is ‘None’ the nodata value is not set for that band. The number of target bands is inferred from the length of this list.
fill_value_list (sequence) – list of values to fill each band with. If None, no filling is done.
n_rows (int) – if not None, defines the number of target raster rows.
n_cols (int) – if not None, defines the number of target raster columns.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.numpy_array_to_raster(base_array, target_nodata, pixel_size, origin, projection_wkt, target_path, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Create a single band raster of size base_array.shape.

The GDAL datatype of the target raster is determined by the numpy dtype of base_array.

Note

The origin and pixel_size parameters must both be defined properly as 2-tuples of floats, or else must both be set to None. A ValueError will be raised otherwise.

Parameters:

base_array (numpy.array) – a 2d numpy array.
target_nodata (numeric) – nodata value of target array, can be None.
pixel_size (tuple) – square dimensions (in (x, y)) of pixel. Can be None to indicate no stated pixel size.
origin (tuple/list) – x/y coordinate of the raster origin. Can be None to indicate no stated origin.
projection_wkt (str) – target projection in wkt. Can be None to indicate no projection/SRS.
target_path (str) – path to raster to create that will be of the same type of base_array with contents of base_array.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

pygeoprocessing.geoprocessing.raster_calculator(base_raster_path_band_const_list, local_op, target_raster_path, datatype_target, nodata_target, calc_raster_stats=True, use_shared_memory=False, largest_block=65536, max_timeout=60.0, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Apply local a raster operation on a stack of rasters.

This function applies a user defined function across a stack of rasters’ pixel stack. The rasters in base_raster_path_band_list must be spatially aligned and have the same cell sizes.

Parameters:

base_raster_path_band_const_list (sequence) –
a sequence containing:
- (str, int) tuples, referring to a raster path/band index pair to use as an input.
- numpy.ndarray s of up to two dimensions. These inputs must all be broadcastable to each other AND the size of the raster inputs.
- (object, 'raw') tuples, where object will be passed directly into the local_op.
All rasters must have the same raster size. If only arrays are input, numpy arrays must be broadcastable to each other and the final raster size will be the final broadcast array shape. A value error is raised if only “raw” inputs are passed.
local_op (function) – a function that must take in as many parameters as there are elements in base_raster_path_band_const_list. The parameters in local_op will map 1-to-1 in order with the values in base_raster_path_band_const_list. raster_calculator will call local_op to generate the pixel values in target_raster along memory block aligned processing windows. Note any particular call to local_op will have the arguments from raster_path_band_const_list sliced to overlap that window. If an argument from raster_path_band_const_list is a raster/path band tuple, it will be passed to local_op as a 2D numpy array of pixel values that align with the processing window that local_op is targeting. A 2D or 1D array will be sliced to match the processing window and in the case of a 1D array tiled in whatever dimension is flat. If an argument is a scalar it is passed as as scalar. The return value must be a 2D array of the same size as any of the input parameter 2D arrays and contain the desired pixel values for the target raster.
target_raster_path (string) – the path of the output raster. The projection, size, and cell size will be the same as the rasters in base_raster_path_const_band_list or the final broadcast size of the constant/ndarray values in the list.
datatype_target (gdal datatype; int) – the desired GDAL output type of the target raster.
nodata_target (numerical value) – the desired nodata value of the target raster.
calc_raster_stats (boolean) – If True, calculates and sets raster statistics (min, max, mean, and stdev) for target raster.
use_shared_memory (boolean) – If True, uses Python Multiprocessing shared memory to calculate raster stats for faster performance. This feature is available for Python >= 3.8 and will otherwise be ignored for earlier versions of Python.
largest_block (int) – Attempts to internally iterate over raster blocks with this many elements. Useful in cases where the blocksize is relatively small, memory is available, and the function call overhead dominates the iteration. Defaults to 2**20. A value of anything less than the original blocksize of the raster will result in blocksizes equal to the original size.
max_timeout (float) – amount of time in seconds to wait for stats worker thread to join. Default is _MAX_TIMEOUT.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

Raises:

ValueError – invalid input provided

pygeoprocessing.geoprocessing.raster_map(op, rasters, target_path, target_nodata=None, target_dtype=None, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Apply a pixelwise function to a series of raster inputs.

The output raster will have nodata where any input raster has nodata. Raster inputs are split into aligned blocks, and the function is applied individually to each stack of blocks (as numpy arrays).

Parameters:

op (function) – Function to apply to the inputs. It should accept a number of arguments equal to the length of *inputs. It should return a numpy array with the same shape as its array input(s).
rasters (list[str]) – Paths to rasters to input to op, in the order that they will be passed to op. All rasters should be aligned and have the same dimensions.
target_path (str) – path to write out the output raster.
target_nodata (number) – Nodata value to use for the output raster. Optional. If not provided, a suitable nodata value will be chosen.
target_dtype (numpy.dtype) – dtype to use for the output. Optional. If not provided, a suitable dtype will be chosen.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS. If the target_dtype is int8, the PIXELTYPE=SIGNEDBYTE option will be added to the creation options tuple if it is not already there.

Returns:

None

pygeoprocessing.geoprocessing.raster_reduce(function, raster_path_band, initializer, mask_nodata=True, largest_block=65536)[source]

Cumulatively apply a reducing function to each block of a raster.

This effectively reduces the entire raster to a single value, but it works by blocks to be memory-efficient.

The function signature should be function(aggregator, block), where aggregator is the aggregated value so far, and block is a flattened numpy array containing the data from the block to reduce next.

function is called once on each block. On the first function call, aggregator is initialized with initializer. The return value from each function call is passed in as the aggregator argument to the subsequent function call. When all blocks have been reduced, the return value of the final function call is returned.

Example

Calculate the sum of all values in a raster:

raster_reduce(lambda total, block: total + numpy.sum(block),
              (raster_path, 1), 0)

Calculate a histogram of all values in a raster:

def add_to_histogram(histogram, block):
    return histogram + numpy.histogram(block, bins=10)[0]

raster_reduce(add_to_histogram, (raster_path, 1), numpy.zeros(10))

Calculate the sum of all values in a raster, excluding nodata:

nodata = pygeoprocessing.get_raster_info(raster_path)['nodata'][0]
def sum_excluding_nodata(total, block):
    return total + numpy.sum(block[block != nodata])

raster_reduce(sum_excluding_nodata, (raster_path, 1), 0)

Parameters:

function (func) – function to apply to each raster block
raster_path_band (tuple) – (path, band) tuple of the raster to reduce
initializer (obj) – value to initialize the aggregator for the first function call
mask_nodata (bool) – if True, mask out nodata before aggregating. A flattened array of non-nodata pixels from each block is passed to the function. if False, each block is passed to the function without masking.
largest_block (int) – largest block parameter to pass to iterblocks

Returns:

aggregate value, the final value returned from function

pygeoprocessing.geoprocessing.raster_to_numpy_array(raster_path, band_id=1)[source]

Read the entire contents of the raster band to a numpy array.

Parameters:

raster_path (str) – path to raster.
band_id (int) – band in the raster to read.

Returns:

numpy array contents of band_id in raster.

pygeoprocessing.geoprocessing.rasterize(vector_path, target_raster_path, burn_values=None, option_list=None, layer_id=0, where_clause=None)[source]

Project a vector onto an existing raster.

Burn the layer at layer_id in vector_path to an existing raster at target_raster_path_band.

Parameters:

vector_path (string) – filepath to vector to rasterize.
target_raster_path (string) – path to an existing raster to burn vector into. Can have multiple bands.
burn_values (list/tuple) – optional sequence of values to burn into each band of the raster. If used, should have the same length as number of bands at the target_raster_path raster. If None then option_list must have a valid value.
option_list (list/tuple) –
optional a sequence of burn options, if None then a valid value for burn_values must exist. Otherwise, each element is a string of the form:
- "ATTRIBUTE=?": Identifies an attribute field on the features to be used for a burn in value. The value will be burned into all output bands. If specified, burn_values will not be used and can be None.
- "CHUNKYSIZE=?": The height in lines of the chunk to operate on. The larger the chunk size the less times we need to make a pass through all the shapes. If it is not set or set to zero the default chunk size will be used. Default size will be estimated based on the GDAL cache buffer size using formula: cache_size_bytes/scanline_size_bytes, so the chunk will not exceed the cache.
- "ALL_TOUCHED=TRUE/FALSE": May be set to TRUE to set all pixels touched by the line or polygons, not just those whose center is within the polygon or that are selected by Brezenhams line algorithm. Defaults to FALSE.
- "BURN_VALUE_FROM": May be set to “Z” to use the Z values of the geometries. The value from burn_values or the attribute field value is added to this before burning. In default case dfBurnValue is burned as it is (richpsharp: note, I’m not sure what this means, but copied from formal docs). This is implemented properly only for points and lines for now. Polygons will be burned using the Z value from the first point.
- "MERGE_ALG=REPLACE/ADD": REPLACE results in overwriting of value, while ADD adds the new value to the existing raster, suitable for heatmaps for instance.
Example:
```
["ATTRIBUTE=npv", "ALL_TOUCHED=TRUE"]
```
layer_id (str/int) – name or index of the layer to rasterize. Defaults to 0.
where_clause (str) – If not None, is an SQL query-like string to filter which features are used to rasterize, (e.x. where=”value=1”).

Returns:

None

pygeoprocessing.geoprocessing.reclassify_raster(base_raster_path_band, value_map, target_raster_path, target_datatype, target_nodata, values_required=True, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')))[source]

Reclassify pixel values in a raster.

A function to reclassify values in raster to any output type. By default the values except for nodata must be in value_map. Include the base raster nodata value in value_map to reclassify nodata pixels to a valid value.

Parameters:

base_raster_path_band (tuple) – a tuple including file path to a raster and the band index to operate over. ex: (path, band_index)
value_map (dictionary) – a dictionary of values of {source_value: dest_value, …} where source_value’s type is the same as the values in base_raster_path at band band_index. Cannot be empty and must contain a mapping for all raster values if values_required=True. If nodata is mapped, nodata will be reclassified, otherwise nodata will be set to target_nodata.
target_raster_path (string) – target raster output path; overwritten if it exists
target_datatype (gdal type) – the numerical type for the target raster
target_nodata (numerical type) – the nodata value for the target raster. Must be the same type as target_datatype. All nodata pixels in the base raster will be reclassified to this value, unless the base raster notata values are present in value_map.
values_required (bool) – If True, raise a ValueError if there is a value in the raster that is not found in value_map.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.

Returns:

None

Raises:

ReclassificationMissingValuesError – if values_required is True and a pixel value from base_raster_path_band is not a key in value_map.

pygeoprocessing.geoprocessing.reproject_vector(base_vector_path, target_projection_wkt, target_path, layer_id=0, driver_name='ESRI Shapefile', copy_fields=True, target_layer_name=None, osr_axis_mapping_strategy=0)[source]

Reproject OGR DataSource (vector).

Transforms the features of the base vector to the desired output projection in a new vector.

Note

If the ESRI Shapefile driver is used, the target_layer_name optional parameter is ignored. ESRI Shapefiles by definition use the filename to define the layer name.

Parameters:

base_vector_path (string) – Path to the base shapefile to transform.
target_projection_wkt (string) – the desired output projection in Well Known Text (by layer.GetSpatialRef().ExportToWkt())
target_path (string) – the filepath to the transformed shapefile
layer_id (str/int) – name or index of layer in base_vector_path to reproject. Defaults to 0.
driver_name (string) – String to pass to ogr.GetDriverByName, defaults to ‘ESRI Shapefile’.
copy_fields (bool or iterable) – If True, all the fields in base_vector_path will be copied to target_path during the reprojection step. If it is an iterable, it will contain the field names to exclusively copy. An unmatched fieldname will be ignored. If False no fields are copied into the new vector.
target_layer_name=None (str) – The name to use for the target layer in the new vector. If None (the default), the layer name from the source layer will be used.
osr_axis_mapping_strategy (int) – OSR axis mapping strategy for SpatialReference objects. Defaults to geoprocessing.DEFAULT_OSR_AXIS_MAPPING_STRATEGY. This parameter should not be changed unless you know what you are doing.

Returns:

None

pygeoprocessing.geoprocessing.shapely_geometry_to_vector(shapely_geometry_list, target_vector_path, projection_wkt, vector_format, fields=None, attribute_list=None, ogr_geom_type=3)[source]

Convert list of geometry to vector on disk.

Parameters:

shapely_geometry_list (list) – a list of Shapely objects.
target_vector_path (str) – path to target vector.
projection_wkt (str) – WKT for target vector.
vector_format (str) – GDAL driver name for target vector.
fields (dict) – a python dictionary mapping string fieldname to OGR Fieldtypes, if None no fields are added
attribute_list (list of dicts) – a list of python dictionary mapping fieldname to field value for each geometry in shapely_geometry_list, if None, no attributes are created.
ogr_geom_type (ogr geometry enumerated type) – sets the target layer geometry type. Defaults to wkbPolygon.

Returns:

None

pygeoprocessing.geoprocessing.stitch_rasters(base_raster_path_band_list, resample_method_list, target_stitch_raster_path_band, overlap_algorithm='etch', area_weight_m2_to_wgs84=False, osr_axis_mapping_strategy=0)[source]

Stitch the raster in the base list into the existing target.

Parameters:

base_raster_path_band_list (sequence) – sequence of raster path/band tuples to stitch into target.
resample_method_list (sequence) – a sequence of resampling methods which one to one map each path in base_raster_path_band_list during resizing. Each element must be one of, ‘rms | mode | sum | q1 | near | q3 | average | cubicspline | bilinear | max | med | min | cubic | lanczos’
target_stitch_raster_path_band (tuple) – raster path/band tuple to an existing raster, values in base_raster_path_band_list will be stitched into this raster/band in the order they are in the list. The nodata value for the target band must be defined and will be written over with values from the base raster. Nodata values in the base rasters will not be written into the target. If the pixel size or projection are different between base and target the base is warped to the target’s cell size and target with the interpolation method provided. If any part of the base raster lies outside of the target, that part of the base is ignored. A warning is logged if the entire base raster is outside of the target bounds.
overlap_algorithm (str) –
this value indicates which algorithm to use when a raster is stitched on non-nodata values in the target stitch raster. It can be one of the following:
- ’etch’: write a value to the target raster only if the target raster pixel is nodata. If the target pixel is non-nodata ignore any additional values to write on that pixel.
- ’replace’: write a value to the target raster irrespective of the value of the target raster
- ’add’: add the value to be written to the target raster to any existing value that is there. If the existing value is nodata, treat it as 0.0.
area_weight_m2_to_wgs84 (bool) – If True the stitched raster will be converted to a per-area value before reprojection to wgs84, then multiplied by the m^2 area per pixel in the wgs84 coordinate space. This is useful when the quantity being stitched is a total quantity per pixel rather than a per unit area density. Note this assumes input rasters are in a projected space of meters, if they are not the stitched output will be nonsensical.
osr_axis_mapping_strategy (int) – OSR axis mapping strategy for SpatialReference objects. Defaults to geoprocessing.DEFAULT_OSR_AXIS_MAPPING_STRATEGY. This parameter should not be changed unless you know what you are doing.

Returns:

None.

pygeoprocessing.geoprocessing.transform_bounding_box(bounding_box, base_projection_wkt, target_projection_wkt, edge_samples=11, osr_axis_mapping_strategy=0)[source]

Transform input bounding box to output projection.

This transform accounts for the fact that the reprojected square bounding box might be warped in the new coordinate system. To account for this, the function samples points along the original bounding box edges and attempts to make the largest bounding box around any transformed point on the edge whether corners or warped edges.

Parameters:

bounding_box (sequence) – a sequence of 4 coordinates in base_epsg coordinate system describing the bound in the order [xmin, ymin, xmax, ymax].
base_projection_wkt (string) – the spatial reference of the input coordinate system in Well Known Text.
target_projection_wkt (string) – the spatial reference of the desired output coordinate system in Well Known Text.
edge_samples (int) – the number of interpolated points along each bounding box edge to sample along. A value of 2 will sample just the corners while a value of 3 will also sample the corners and the midpoint.
osr_axis_mapping_strategy (int) – OSR axis mapping strategy for SpatialReference objects. Defaults to geoprocessing.DEFAULT_OSR_AXIS_MAPPING_STRATEGY. This parameter should not be changed unless you know what you are doing.

Returns:

A list of the form [xmin, ymin, xmax, ymax] that describes the largest fitting bounding box around the original warped bounding box in new_epsg coordinate system.

Raises:

ValueError` if resulting transform yields non-finite coordinates –
This would indicate an ill posed transform region that the user –
should address. –

pygeoprocessing.geoprocessing.warp_raster(base_raster_path, target_pixel_size, target_raster_path, resample_method, target_bb=None, base_projection_wkt=None, target_projection_wkt=None, n_threads=None, mask_options=None, vector_mask_options=None, gdal_warp_options=None, working_dir=None, use_overview_level=-1, raster_driver_creation_tuple=('GTIFF', ('TILED=YES', 'BIGTIFF=YES', 'COMPRESS=LZW', 'BLOCKXSIZE=256', 'BLOCKYSIZE=256')), osr_axis_mapping_strategy=0)[source]

Resize/resample raster to desired pixel size, bbox and projection.

Parameters:

base_raster_path (string) – path to base raster.
target_pixel_size (list/tuple) – a two element sequence indicating the x and y pixel size in projected units.
target_raster_path (string) – the location of the resized and resampled raster.
resample_method (string) – the resampling algorithm. Must be a valid resampling algorithm for gdal.WarpRaster, one of: ‘rms | mode | sum | q1 | near | q3 | average | cubicspline | bilinear | max | med | min | cubic | lanczos’
target_bb (sequence) – if None, target bounding box is the same as the source bounding box. Otherwise it’s a sequence of float describing target bounding box in target coordinate system as [minx, miny, maxx, maxy].
base_projection_wkt (string) – if not None, interpret the projection of base_raster_path as this.
target_projection_wkt (string) – if not None, desired target projection in Well Known Text format.
n_threads (int) – optional, if not None this sets the N_THREADS option for gdal.Warp.
mask_options (dict or None) –
optional. If None, no masking will be done. If a dict, it is a dictionary of options relating to the dataset mask. Keys to this dictionary are:
- 'mask_vector_path': (str) path to the mask vector file. This vector will be automatically projected to the target projection if its base coordinate system does not match the target. Where there are geometries in this vector, pixels in base_raster_path will propagate to target_raster_path.
- 'mask_layer_id': (int/str) the layer index or name to use for masking, if this key is not in the dictionary the default is to use the layer at index 0.
- 'mask_vector_where_filter': (str) an SQL WHERE string that can be used to filter the geometry in the mask. Ex: ‘id > 10’ would use all features whose field value of ‘id’ is > 10.
- 'mask_raster_path': (str). If present in the dict, all other keys in mask_options are ignored. This string must be a path to a raster representing a validity mask, where pixel values of 1 indicate validity. This raster must be in the same projection and have the same dimensions as the target warped raster. The general (and easiest) use case for warp_raster is to use 'mask_vector_path' instead.
vector_mask_options=None (dict) – Alias for mask_options. This option is deprecated and will be removed in a future release of pygeoprocessing.
gdal_warp_options (sequence) – if present, the contents of this list are passed to the warpOptions parameter of gdal.Warp. See the GDAL Warp documentation for valid options.
working_dir (string) – if defined uses this directory to make temporary working files for calculation. Otherwise uses system’s temp directory.
use_overview_level=-1 (int/str) – The overview level to use for warping. A value of -1 (the default) indicates that the base raster should be used for the source pixels. A value of 'AUTO' will make GDAL select the overview with the resolution that is closest to the target pixel size and warp using that overview’s pixel values. Any other integer indicates that that overview index should be used. For example, suppose the raster has overviews at levels 2, 4 and 8. To use level 2, set use_overview_level=0. To use level 8, set use_overview_level=2.
raster_driver_creation_tuple (tuple) – a tuple containing a GDAL driver name string as the first element and a GDAL creation options tuple/list as the second. Defaults to a GTiff driver tuple defined at geoprocessing.DEFAULT_GTIFF_CREATION_TUPLE_OPTIONS.
osr_axis_mapping_strategy (int) – OSR axis mapping strategy for SpatialReference objects. Defaults to geoprocessing.DEFAULT_OSR_AXIS_MAPPING_STRATEGY. This parameter should not be changed unless you know what you are doing.

Returns:

None

Raises:

ValueError – if pixel_size is not a 2 element sequence of numbers.
ValueError – if mask_options is not None but the mask_vector_path is undefined or doesn’t point to a valid file.

pygeoprocessing.geoprocessing.zonal_statistics(base_raster_path_band, aggregate_vector_path, aggregate_layer_name=None, ignore_nodata=True, polygons_might_overlap=True, include_value_counts=False, working_dir=None)[source]

Collect stats on pixel values which lie within polygons.

This function summarizes raster statistics including min, max, mean, and pixel count over the regions on the raster that are overlapped by the polygons in the vector layer. Statistics are calculated in two passes, where first polygons aggregate over pixels in the raster whose centers intersect with the polygon. In the second pass, any polygons that are not aggregated use their bounding box to intersect with the raster for overlap statistics.

Statistics are calculated on the set of pixels that fall within each feature polygon. If ignore_nodata is false, nodata pixels are considered valid when calculating the statistics:

‘min’: minimum valid pixel value
‘max’: maximum valid pixel value
‘sum’: sum of valid pixel values
‘count’: number of valid pixels
‘nodata_count’: number of nodata pixels
‘value_counts’: number of pixels having each unique value

Note

There may be some degenerate cases where the bounding box vs. actual geometry intersection would be incorrect, but these are so unlikely as to be manually constructed. If you encounter one of these please create an issue at https://github.com/natcap/pygeoprocessing/issues with the datasets used.

Parameters:

base_raster_path_band (tuple or list[tuple]) – base raster (path, band) tuple(s) to analyze. If a list of multiple raster bands is provided, they must be aligned (all having the same bounding box, geotransform, and projection).
aggregate_vector_path (string) – a path to a polygon vector whose geometric features indicate the areas in base_raster_path_band to calculate zonal statistics.
aggregate_layer_name (string) – name of vector layer that will be used to aggregate results over. If set to None, the first layer in the DataSource will be used as retrieved by .GetLayer(). Note: it is normal and expected to set this field at None if the aggregating vector dataset has a single layer as many do, including the common ‘ESRI Shapefile’.
ignore_nodata – if true, then nodata pixels are not accounted for when calculating min, max, count, or mean. However, the value of nodata_count will always be the number of nodata pixels aggregated under the polygon.
polygons_might_overlap (boolean) – if True the function calculates aggregation coverage close to optimally by rasterizing sets of polygons that don’t overlap. However, this step can be computationally expensive for cases where there are many polygons. Setting this flag to False directs the function rasterize in one step.
include_value_counts (boolean) – If True, the function tallies the number of pixels of each value under the polygon. This is useful for classified rasters but could exhaust available memory when run on a continuous (floating-point) raster. Defaults to False.
working_dir (string) – If not None, indicates where temporary files should be created during this run.

Returns:

If base_raster_path_band is a tuple, the return value is a nested dictionary of stats for that raster band. Top-level keys are the aggregate feature FIDs. Each nested FID dictionary then contains statistics about that feature: ‘min’, ‘max’, ‘sum’, ‘count’, ‘nodata_count’, and optionally ‘value_counts’. Example:

{
    0: {
        'min': 0,
        'max': 14,
        'sum': 42,
        'count': 8,
        'nodata_count': 1,
        'value_counts': {
            2: 5,
            4: 1,
            14: 2
        }
    }
}

If base_raster_path_band is a list of tuples, the return value is a list of the nested dictionaries described above. Each dictionary in the list contains the stats calculated for the corresponding raster band in the base_raster_path_band list.

Raises:

ValueError – if base_raster_path_band is incorrectly formatted, or if not all of the input raster bands are aligned with each other
RuntimeError – if the aggregate vector or layer cannot be opened