What:
- A simple system to serialize lists of numbers.
Why:
- Programmers should use visualization as an everyday tool when developing algorithms.
- Most times if you just look at the final results via some aggregate statistics, for non trivial code, you end up missing important details that could lead to better solutions.
- Visualize often and early. Visualize the dynamic behaviour of your code!
- What I used to do for the most part is to printf() from C code times values in a simple csv format, or directly as Mathematica arrays.
- Mathematica is great for visualization and often with a one-liner expression I can process and display the data I emitted. Often I even copy the Mathematica code to do so as a comment in the C source.
- Sometimes I peek directly in the process memory...
- This hack’n’slash approach is fine, but it starts to be very inconvenient when you need to dump a lot of data and/or if the data is generated by multiple threads or in different stages in the program.
- Importing the data can be very slow as well!
- Thus, I finally decided I needed a better serialization code...
Features:
- Schema-less. Serializes arrays of numbers. Supports nested arrays, no need to know the array dimensions up-front. Can represent any structure.
- Compact. Stores numbers, internally, in the smallest type that can contain them (from 8-bit integers to double-precision floating point). Decodes always as double, transparently.
- Sample import code for Processing.
- Can also serialize to CSV, Mathematica arrays and UBJSON (which Mathematica 11.x can import directly)
- Multi-thread safe.
- Automatically sorts and optionally collates together data streams coming from different threads.
- Not too slow. Usable. I would probably rewrite it from scratch now that I understand what I can do better - but the current implementation is good enough that I don't care, and the interface is ok.
- Absolutely NOT meant to be used as a "real" serialization format, everything is meant to be easy to drop in an existing codebase, zero dependencies, and get some data out quickly, to then be removed...
Bonus: "TableLog" (included in the same source)
- A system for statistical aggregation, for when you really have lots of data...
- ...or the problem is simple enough that you know what statistics to extract from the C code!
- Represents a data table (rows, columns).
- Each row should be an independent "item" or experiment.
- Each column is a quantity to be measured of the given item.
- Multiple samples (data values) can be "pushed" to given rows/columns.
- Columns automatically compute statistics over samples.
- Each column can aggregate a different number of samples.
- Each column can be configured to compute different statistics: average, minimum, maximum, histograms of different sizes.
- Multithread-safe.
- Multiple threads can write to different rows...
- ...or the same row can be "opened" globally across threads.
- Columns can be added incrementally (but will appear in all rows).
DataLog: C code - computing & exporting data
|
DataLog: Processing code - importing & visualizing data |
TableLog: C code |
TableLog: Data imported in Excel |
3 comments:
Link Dead. Could you please reupload ?
Fixed!
Thank You :-).
Post a Comment