Hi all
Does anyone know a good, user friendly statistics / analytics package for
Linux? The trick is - it needs to be able to handle an absolutely massive
dataset - 13m rows.
For Uni, I have a dataset with no fewer than 13m records and I need to run a
regression on it. In fact I probably need to run about a thousand
regressions comparing results later.
In theory, something like Libre could handle the individual regressions once
I've split the txt file up but I don't want to get into faffing around with
awk, sed, cat, head etc etc (takes ages, creates massive files and besides
which the file needs splitting according to a rule which uses a field within
it that at present I can't guarantee it's sorted on). I can't afford the
frankly ludicrous prices charged for SAS and SPSS. I just wondered if any
of you knew if there was something really good that people are using and
I've missed.
I've tried:
"R" [1] - powerful but very clunky and a dreadful GUI
"PSPP" [2] - still a work in progress and truly awfully formatted output.
It'll get there one day but it's a mile off at the moment.
"DAP" - won't compile for me and I don't have time to investigate.
"gretl" [3] - seemingly for economists who seldom have to handle such big
datasets.
Various database packages which are fine for handling the data - but don't
run to linear regression.
Cheers
Rob
[1]
www.r-project.org
[2]
http://www.gnu.org/software/pspp/
[3]
http://gretl.sourceforge.net/
--
Please post to: Hampshire@???
Web Interface:
https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL:
http://www.hantslug.org.uk
--------------------------------------------------------------