Check out the Getting Started section to know how to install and use . The library can also be used on aws elastic cloud or single instance.
You can check detailed walkthrough of example codes here, starting from simple examples and moving to real world problems ranging from scientific simulations to supervised machine learning from everyday data analytics tasks.
If you are new to functional or modern C++, syntax may look unfamiliar at first but there is a very minimal set of things to understand before you begin writing code with easyLambda. This guide tries to put these details together. In the process you will be introduced to some concepts in modern C++, functional programming and dataflow programming that are helpful beyond this library as well.
Suggestions and feedback are welcome. Feel free to contact via mail or issues for any query.
Some of the possible directions of improvement:
Possible ideas for future extenstions:
Check the blog and internals for discussion on some exciting internal and implementation details.
Programming with pure functional dataflows is similar to the way we think in spreadsheet programs, SQL queries or declarative commands like awk, cut etc, without any restrictions (you can apply any C/C++ function to data columns).
Let’s say we have data with ten columns, we want to add first two columns and find how many of them add up to more than a hundred. In spreadsheets one way of doing this can be as follows:
Here, is the easyLambda code for the same.
The data flows to the functions wrapped inside units (map, filter, reduce) one after another. The map step applies a C / C++ function to all rows and adds the resulting column(s) at the end of the row. The number in the angular brackets specify column number(s) to work on. The dataSource can be from a list variable, file or anything like network etc. because source can also be just another C / C++ function.
The following program calculates frequency of each word in the data files.
The dataflow starts with rise
and subsequent operations are added to the dataflow. In the above example, the dataflow starts with reading data from file(s). fromFile
is a library function that takes column types and file(s) glob pattern as input and reads the file(s) in parallel. It has a lot of properties for controlling data-format, parallelism, denormalization etc (shown in demoFromFile).
In reduce we pass the index of the key column to group by as template parameter (inside < >), a library function for counting and initial value of the count.
Following is the dataflow for calculating pi using Monte-Carlo method.
The dataflow starts with rise in which we pass a library function to call the next unit a number of times. The steps in the algorithm have been expressed with the composition of small operations, some are common library functions like count()
, lt()
(less-than) and some are user-defined functions specific to the problem.
Here is another example from cods2016. The input data contains student profiles with scores, gender, job-salary, city etc.
The above example prints the correlation of English, logical and domain scores with respect to gender. We can find similarity of the above code with steps in a spreadsheet analysis or with SQL query. We select the columns to work with viz. gender and three scores. We filter the rows based on a column and predicate. Next, we transform a selected column in-place and then find an aggregate property (correlation) for all the rows.