Hi all
 
Does anyone know a good, user friendly statistics / analytics package for
Linux?   The trick is - it needs to be able to handle an absolutely massive
dataset - 13m rows.  
 
For Uni, I have a dataset with no fewer than 13m records and I need to run a
regression on it.   In fact I probably need to run about a thousand
regressions comparing results later.   
 
In theory, something like Libre could handle the individual regressions once
I've split the txt file up but I don't want to get into faffing around with
awk, sed, cat, head etc etc (takes ages, creates massive files and besides
which the file needs splitting according to a rule which uses a field within
it that at present I can't guarantee it's sorted on).   I can't afford the
frankly ludicrous prices charged for SAS and SPSS.   I just wondered if any
of you knew if there was something really good that people are using and
I've missed.
I've tried:
"R" [1] - powerful but very clunky and a dreadful GUI
"PSPP" [2] - still a work in progress and truly awfully formatted output.
It'll get there one day but it's a mile off at the moment.
"DAP" - won't compile for me and I don't have time to investigate.
"gretl" [3] - seemingly for economists who seldom have to handle such big
datasets.
Various database packages which are fine for handling the data - but don't
run to linear regression.
 
Cheers
Rob
[1] 
www.r-project.org
[2] 
http://www.gnu.org/software/pspp/
[3] 
http://gretl.sourceforge.net/
 
--
Please post to: Hampshire@???
Web Interface: 
https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: 
http://www.hantslug.org.uk
--------------------------------------------------------------