I can't speak for this project, but my own for CSV files ( https://github.com/dw/csvmonkey ) provides a high level interface that allows the tokenized data to be manipulated in-place without full decoding. The interface exported in Python is that of a plain old dictionary with one added magical semantic (lazy decode on element access). The internal representation of the parse result is a simple fixed array of (ptr, size) pairs
Methods like this are used for batch search / summation where only a fraction of the parsed data is actually relevant during any particular run. You'll find similar approaches used in e.g. the row format parser of a database like MongoDB or Postgres
Methods like this are used for batch search / summation where only a fraction of the parsed data is actually relevant during any particular run. You'll find similar approaches used in e.g. the row format parser of a database like MongoDB or Postgres