`R/clustering.R`

`redcap.Rd`

REDCAP (Regionalization with dynamically constrained agglomerative clustering and partitioning) is developed by D. Guo (2008). Like SKATER, REDCAP starts from building a spanning tree with 4 different ways (single-linkage, average-linkage, ward-linkage and the complete-linkage). The single-linkage way leads to build a minimum spanning tree. Then,REDCAP provides 2 different ways (first-order and full-order constraining) to prune the tree to find clusters. The first-order approach with a minimum spanning tree is exactly the same with SKATER. In GeoDa and pygeoda, the following methods are provided: \* First-order and Single-linkage \* Full-order and Complete-linkage \* Full-order and Average-linkage \* Full-order and Single-linkage \* Full-order and Ward-linkage

```
redcap(
k,
w,
df,
method = "fullorder-averagelinkage",
bound_variable = data.frame(),
min_bound = 0,
scale_method = "standardize",
distance_method = "euclidean",
random_seed = 123456789,
cpu_threads = 6,
rdist = numeric()
)
```

- k
The number of clusters

- w
An instance of Weight class

- df
A data frame with selected variables only. E.g. guerry[c("Crm_prs", "Crm_prp", "Litercy")]

- method
"firstorder-singlelinkage", "fullorder-completelinkage", "fullorder-averagelinkage","fullorder-singlelinkage", "fullorder-wardlinkage"

- bound_variable
(optional) A data frame with selected bound variabl

- min_bound
(optional) A minimum bound value that applies to all clusters

- scale_method
(optional) One of the scaling methods 'raw', 'standardize', 'demean', 'mad', 'range_standardize', 'range_adjust' to apply on input data. Default is 'standardize' (Z-score normalization).

- distance_method
(optional) The distance method used to compute the distance betwen observation i and j. Defaults to "euclidean". Options are "euclidean" and "manhattan"

- random_seed
(int,optional) The seed for random number generator. Defaults to 123456789.

- cpu_threads
(optional) The number of cpu threads used for parallel computation

- rdist
(optional) The distance matrix (lower triangular matrix, column wise storage)

A names list with names "Clusters", "Total sum of squares", "Within-cluster sum of squares", "Total within-cluster sum of squares", and "The ratio of between to total sum of squares".

```
if (FALSE) {
library(sf)
guerry_path <- system.file("extdata", "Guerry.shp", package = "rgeoda")
guerry <- st_read(guerry_path)
queen_w <- queen_weights(guerry)
data <- guerry[c('Crm_prs','Crm_prp','Litercy','Donatns','Infants','Suicids')]
guerry_clusters <- redcap(4, queen_w, data, "fullorder-completelinkage")
guerry_clusters
}
```