[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sc-devel] [commit?] FileReader ability to skip rows (start offset, subsampling)



hi dan,

On 05.03.2008, at 14:14, Dan Stowell wrote:
I'm working with some very large CSV data files at the moment. sclang
ends up running out of memory if using CSVFileReader on a file that's
too big.

just a note (sorry, no implementation): for scenarios like this it would be cool if FileReader returned a lazy list (stream) of rows instead of the whole file at once. this is especially feasible for non-hierarchical data like CSV. rows can then be skipped, collected, filtered etc. without memory overhead (depending on how you use the results of course). i'm currently processing large CSV data files in haskell, where much of this comes for free ;)

<sk>