libRDataFrame  0.815
A library with R style datatypes and associated utilities
 All Classes Files Functions Variables Pages
Design Notes

A library, containing a consolidated collection of routines to support R-style Data Structures for C++

Objective:

The overall goal is the development of a set of data structures suited particularly to the use of C++ in data analysis. Core to this is the development of a data table structure like the R data.frame.

Perhaps one of the most attractive features in the R language (after the implicit idea vectorization of it's structures and functions) is it's data structures, particularly the dataframe.

This is emulated to a degree in python with the pandas data analysis library. Other interpreted and proprietary statistical environments have approximations or substitutes (like spreadsheets) for dataframes. The use of dataframe like transformational structure have been an obvious part of R packages writtent in C++). A search of the open source C++ template libraried etc has not turned up, for me anyway, the obvious evolution of matrices or lists for C++ programming generally.

Further, the inclusion of categorical data in analyses needs a factor data structure like that in R and an ability to construct dummy variables for linear model development. Currently the implementation is contained in,

see also the Unit Test Function in main.cpp

Principles and Conception:

Initially a list of vectors was the basic plan in keeping with the R structure. However R lists and C++ STL lists are different enough as to handicape any attempt to mimic R structures in C++. Thus a move tp the data.frame in C++ being a vector of vectors.

The principal issue is the need to random access the variables (columns) in the data.frame is the C++ implementation. The C++ list is implemented as a sequentially accessed object to allow access for actions on the elements across the list. The random access of individual list members is not implemented in the list structure.

Class Diagram

References and Acknowledgements:

for a notable CSV file reading project with same general goal, see Jay Satiro, https://github.com/jay/CSV

Contacts

Author web page, http://crunches-data.appspot.com Email, medma.nosp@m.tix@.nosp@m.gmail.nosp@m..com

Copyright 2016 D.A.York