pygeoda.maxp_tabu¶

pygeoda.maxp_tabu(w, data, bound_variable, min_bound, tabu_length=10, **kwargs)[source]¶

A tabu-search algorithm to solve the max-p-region problem

The max-p-region problem is a special case of constrained clustering where a finite number of geographical areas are aggregated into the maximum number of regions (max-p-regions), such that each region is geographically connected and the clusters could maximize internal homogeneity.

Parameters

w (Weight) – an instance of Weight class
data (list or dataframe) – A list of numeric vectors of selected variable or a data frame of selected variables e.g. guerry[[‘Crm_prs’, ‘Literacy’]]
bound_variable (tuple) – A numeric vector of selected bounding variable
min_bound (float) – A minimum value that the sum value of bounding variable int each cluster should be greater than
tabu_length (int) – The length of a tabu search heuristic of tabu algorithm. Defaults to 10.
conv_tabu (int, optional) – The number of non-improving moves. Defaults to 10.
iterations (int, optional) – The number of iterations of greedy algorithm. Defaults to 99.
init_regions (tuple, optional) – The initial regions that the local search starts with. Default is empty. means the local search starts with a random process to “grow” clusters
scale_method (str, optional) – One of the scaling methods {‘raw’, ‘standardize’, ‘demean’, ‘mad’, ‘range_standardize’, ‘range_adjust’} to apply on input data. Default is ‘standardize’ (Z-score normalization).
distance_method (str, optional) – The distance method used to compute the distance betwen observation i and j. Defaults to “euclidean”. Options are “euclidean” and “manhattan”
random_seed (int, optional) – The seed for random number generator. Defaults to 123456789. It is the same as GeoDa software
cpu_threads (int, optional) – The number of cpu threads used for parallel computation

Returns

A dict with keys {“Clusters”, “TotalSS”, “Within-clusterSS”, “TotalWithin-clusterSS”, “Ratio”}

Return type

dict