.. _load_data_ref:
.. currentmodule:: pygeoda
3 Load Spatial Data
===================
pygeoda supports reading the ESRI ShapeFiles. For other data formats,
please use `geopandas `_ to load the spatial data first,
then pass the geopandas object to function `pygeoda.open()`.
For example, to load the ESRI Shapefile **Guerry.shp**
download from: https://geodacenter.github.io/data-and-lab/Guerry/.
::
>>> import pygeoda
>>> guerry = pygeoda.open("./data/Guerry.shp")
>>> guerry
geoda object:
Number of observations: 85
Number of fields: 27
Geometry type(s): Polygon
field name: field type (shapfile):
CODE_DE string
COUNT real
AVE_ID_ real
...
The geopandas can be used to load or manipulate spatial data first.
Then, the geopandas object can be used to create a geoda object using
the same function `pygeoda.open()`:
::
>>> import geopandas
>>> df = geopandas.read_file("./data/Guerry.shp")
>>>
>>> import pygeoda
>>> guerry = pygeoda.open(df)
>>> guerry
geoda object:
Number of observations: 85
Number of fields: 27
Geometry type(s): ('MultiPolygon', 'Polygon')
field name: field type (numpy.dtype):
CODE_DE object
COUNT float64
AVE_ID_ float64
...
.. note::
The "Geometry type(s)" and "field type" are different from using
pygeoda.open() function to open an ESRI shapefile and a geopandas object.
When opening an ESRI shapefile, the "Geometry type" could be one of
{'Polygon', 'Point', 'Line'} and the "field type" could be one of
{'real', 'string', 'integer'}. When opening a geopandas object, the
"Geometry type" is the defined by `GeoSeries `_
(e.g. MultiPolygon, Polygon, etc.) and the "field type" is defined by `numpy.dtype`
(e.g. object, float64, int64, etc.)
3.1 Attributes of geoda object
------------------------------
* n_cols
* n_obs
* field_names
* field_types
To access the meta-data of the loaded Guerry dataset:
::
>>> print("number of columns:", guerry.num_cols)
number of columns: 26
>>> print("number of observations:", guerry.num_obs)
number of observations: 85
>>> print("field names:", guerry.field_names)
field names: ('CODE_DE', 'COUNT', 'AVE_ID_', 'dept', 'Region', 'Dprtmnt', 'Crm_prs', 'Crm_prp', 'Litercy', 'Donatns', 'Infants', 'Suicids', 'MainCty', 'Wealth', 'Commerc', 'Clergy', 'Crm_prn', 'Infntcd', 'Dntn_cl', 'Lottery', 'Desertn', 'Instrct', 'Prsttts', 'Distanc', 'Area', 'Pop1831')
>>> print("field types:", guerry.field_types)
field types: {'CODE_DE': 'string', 'COUNT':'real', 'AVE_ID_': 'real',...}
.. note::
If using geopandas object in `pygeoda.open()`, there will be a geometry column "geometry" with data type "geometry".
3.2 Access Table Data
---------------------
One can use the bracket `[ ]` operator to access the table data:
>>> guerry = pygeoda.open('./data/Guerry.shp')
>>> guerry['Crm_prs']
(28870.0, 26226.0, 26747.0, 12935.0,...)
>>> guerry[['Crm_prs', 'Litercy']]
[(28870.0, 26226.0, 26747.0, 12935.0,...), (37.0, 51.0, 13.0,...)]
.. note::
If using geopandas object in `pygeoda.open()`, there output of `[[ ]]` operator is a dataframe object, which can be
used as an input parameter in pygeoda functions like `skater()`, `neighbor_match_test()`, etc.
::
>>> guerry = pygeoda.open(df)
>>> guerry['Crm_prs']
[28870.0, 26226.0, 26747.0, 12935.0,...]
>>> guerry[['Crm_prs', 'Litercy']]
Crm_prs Litercy
--------------------
0 28880 37
1 26226 51
.. note::
In pygeoda, to pass the values of a single variable to a pygeoda function, one can use either a tuple or list;
to pass the values of multiple variables, one can use either a list of tuples/lists or a `dataframe `_.