Skip to Main Content

Data Processing & Preparation

This guide provides information about various tools and methods for data processing as well as links to additional learning resources.

Methods

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

Library

 

Scipy  library is one of the core packages that make up the SciPy stack. It provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization

 

Numpy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

 

Get started here with Numpy and Scipy.

Integrated development environments for Python

 

Packages

 

ArcPY: ArcPy is a Python site package that provides a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python into ArcGIS. ArcGIS applications written with ArcPy benefit from the development of additional modules in numerous niches of Python by GIS professionals and programmers from many different disciplines.

Image result

R is ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.  It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

Packages

Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. Depending on the analisys you want to apply, you can use a different package. 

A list of packages can be found here

Integrated development environments (IDE) for R

 

  • R Studio that includes console, syntax-highlighting editor, as well as tools for plotting, history, debugging and workspace management.
  • Eclipse/StatET it is a eclipse based IDE for R. If offers a set of tools for R coding and package building.
  • Visual Studio for R

There are many GIS/RS open source software to process geospatial data.

Learn about some free GIS/RS software that doesn't require development skill to process your data.

More information about software and getting started on the "Software" session.