3 Load Spatial Data

pygeoda supports reading the ESRI ShapeFiles. For other data formats, please use geopandas to load the spatial data first, then pass the geopandas object to function pygeoda.open().

For example, to load the ESRI Shapefile Guerry.shp download from: https://geodacenter.github.io/data-and-lab/Guerry/.

>>> import pygeoda
>>> guerry = pygeoda.open("./data/Guerry.shp")
>>> guerry
geoda object:
    Number of observations: 85
    Number of fields: 27
    Geometry type(s): Polygon
        field name:       field type (shapfile):
        CODE_DE                       string
        COUNT                         real
        AVE_ID_                         real
        ...

The geopandas can be used to load or manipulate spatial data first. Then, the geopandas object can be used to create a geoda object using the same function pygeoda.open():

>>> import geopandas
>>> df = geopandas.read_file("./data/Guerry.shp")
>>>
>>> import pygeoda
>>> guerry = pygeoda.open(df)
>>> guerry
geoda object:
    Number of observations: 85
    Number of fields: 27
    Geometry type(s): ('MultiPolygon', 'Polygon')
        field name:       field type (numpy.dtype):
        CODE_DE                       object
        COUNT                         float64
        AVE_ID_                         float64
        ...

Note

The “Geometry type(s)” and “field type” are different from using pygeoda.open() function to open an ESRI shapefile and a geopandas object. When opening an ESRI shapefile, the “Geometry type” could be one of {‘Polygon’, ‘Point’, ‘Line’} and the “field type” could be one of {‘real’, ‘string’, ‘integer’}. When opening a geopandas object, the “Geometry type” is the defined by GeoSeries (e.g. MultiPolygon, Polygon, etc.) and the “field type” is defined by numpy.dtype (e.g. object, float64, int64, etc.)

3.1 Attributes of geoda object

  • n_cols

  • n_obs

  • field_names

  • field_types

To access the meta-data of the loaded Guerry dataset:

>>> print("number of columns:", guerry.num_cols)
number of columns: 26

>>> print("number of observations:", guerry.num_obs)
number of observations: 85

>>> print("field names:", guerry.field_names)
field names: ('CODE_DE', 'COUNT', 'AVE_ID_', 'dept', 'Region', 'Dprtmnt', 'Crm_prs', 'Crm_prp', 'Litercy', 'Donatns', 'Infants', 'Suicids', 'MainCty', 'Wealth', 'Commerc', 'Clergy', 'Crm_prn', 'Infntcd', 'Dntn_cl', 'Lottery', 'Desertn', 'Instrct', 'Prsttts', 'Distanc', 'Area', 'Pop1831')

>>> print("field types:", guerry.field_types)
field types: {'CODE_DE': 'string', 'COUNT':'real', 'AVE_ID_': 'real',...}

Note

If using geopandas object in pygeoda.open(), there will be a geometry column “geometry” with data type “geometry”.

3.2 Access Table Data

One can use the bracket [ ] operator to access the table data:

>>> guerry = pygeoda.open('./data/Guerry.shp')
>>> guerry['Crm_prs']
(28870.0,  26226.0,  26747.0, 12935.0,...)
>>> guerry[['Crm_prs', 'Litercy']]
[(28870.0,  26226.0,  26747.0, 12935.0,...), (37.0,  51.0,  13.0,...)]

Note

If using geopandas object in pygeoda.open(), there output of [[ ]] operator is a dataframe object, which can be used as an input parameter in pygeoda functions like skater(), neighbor_match_test(), etc.

>>> guerry = pygeoda.open(df)
>>> guerry['Crm_prs']
[28870.0,  26226.0,  26747.0, 12935.0,...]
>>> guerry[['Crm_prs', 'Litercy']]
  Crm_prs    Litercy
--------------------
0 28880      37
1 26226      51

Note

In pygeoda, to pass the values of a single variable to a pygeoda function, one can use either a tuple or list; to pass the values of multiple variables, one can use either a list of tuples/lists or a dataframe.