[Hampshire] Analytics packages

Top Page

Reply to this message
Author: Rob Malpass
Date:  
To: Hampshire LUG Discussion List
Subject: [Hampshire] Analytics packages
Hi all



Does anyone know a good, user friendly statistics / analytics package for
Linux? The trick is - it needs to be able to handle an absolutely massive
dataset - 13m rows.



For Uni, I have a dataset with no fewer than 13m records and I need to run a
regression on it. In fact I probably need to run about a thousand
regressions comparing results later.



In theory, something like Libre could handle the individual regressions once
I've split the txt file up but I don't want to get into faffing around with
awk, sed, cat, head etc etc (takes ages, creates massive files and besides
which the file needs splitting according to a rule which uses a field within
it that at present I can't guarantee it's sorted on). I can't afford the
frankly ludicrous prices charged for SAS and SPSS. I just wondered if any
of you knew if there was something really good that people are using and
I've missed.

I've tried:

"R" [1] - powerful but very clunky and a dreadful GUI

"PSPP" [2] - still a work in progress and truly awfully formatted output.
It'll get there one day but it's a mile off at the moment.

"DAP" - won't compile for me and I don't have time to investigate.

"gretl" [3] - seemingly for economists who seldom have to handle such big
datasets.

Various database packages which are fine for handling the data - but don't
run to linear regression.



Cheers

Rob

[1] www.r-project.org

[2] http://www.gnu.org/software/pspp/

[3] http://gretl.sourceforge.net/



--
Please post to: Hampshire@???
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------