pygeoda.redcap

pygeoda.redcap(k, w, data, method, **kwargs)[source]

Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP)

REDCAP starts from building a spanning tree with 4 different ways (single-linkage, average-linkage, complete-linkage and Ward-linkage). Then, REDCAP provides 2 different ways to prune the tree (First-order and Full-order) to build clusters. In pygeoda, the following methods are provided:

  • First-order and Single-linkage

  • Full-order and Single-linkage

  • Full-order and Complete-linkage

  • Full-order and Average-linkage

  • Full-order and Ward-linkage

Parameters
  • k (int) – number of clusters

  • w (Weight) – An instance of Weight class

  • data (list or dataframe) – A list of numeric vectors of selected variable or a data frame of selected variables e.g. guerry[[‘Crm_prs’, ‘Literacy’]]

  • bound_variable (tuple, optional) – A numeric vector of selected bounding variable

  • min_bound (float, optional) – a minimum value that the sum value of bounding variable int each cluster should be greater than

  • scale_method (str, optional) – One of the scaling methods {‘raw’, ‘standardize’, ‘demean’, ‘mad’, ‘range_standardize’, ‘range_adjust’} to apply on input data. Default is ‘standardize’ (Z-score normalization).

  • distance_method (str, optional) – {“euclidean”, “manhattan”} the distance method used to compute the distance betwen observation i and j. Defaults to “euclidean”. Options are “euclidean” and “manhattan”

  • random_seed (int,optional) – the seed for random number generator. Defaults to 123456789.

  • cpu_threads (int, optional) – The number of cpu threads used for parallel computation

Returns

A dict with keys {“Clusters”, “TotalSS”, “Within-clusterSS”, “TotalWithin-clusterSS”, “Ratio”}

Return type

dict