I'm surprised the authors of this article completely missed the many extended query languages specially made to process large sets of data, published in top computer science conferences by Google, Microsoft, and Yahoo.
It might just be my limited experience with Pig, but I really prefer coding up Hadoop map-reduce applications by hand (using Ruby or Java, depending on the app). Tools like Cascading and Cascalog look nice, and I have experimented with them, but I still like bare-metal map reduce. (BTW, I have been spending most of my time lately writing Hadoop apps).
The reason you'd want a different programming language for this sort of thing would be the fact that the language itself (its syntax, semantics and such) gives it some sort of advantage (usually in terms of abstraction) over using any other language. If there is no such syntax or semantics, then this is pointless. Any old language should do.
Absolutely yes. The answer is a succinct vector language that feels close enough to SQL to be learnable by database guys, but has high level primitives and unboxed basic types that enable automatic optimization and parallelization. The closest thing is K/Q from kx.com but they're clearly too bone headed to lead the game on this one.
The closest thing is K/Q from kx.com but they're clearly too bone headed to lead the game on this one.
Why do you say "bone headed"? To me they're a stand-out success: http://kx.com/Customers/end-user-customers.php. Admittedly they haven't grown, but given Arthur Whitney's minimalism, who could expect them to?
The main problem is that Arthur's work is not open source. It would be called impossible if he hadn't done it, and I think it's a tragedy for the computing world that it has been hidden under a bushel.
That's exactly what I meant by bone headed: their idea of software is very expensive and secretive, rather than their products forming the skeleton of everything else as I feel they should. They talk about their free version like they're selling shareware.
I guess if you're making $125,000 per seat you don't have to worry about innovating your business model too much. :)
I'm surprised no one has released an open source clone, though. It's fairly well documented and I think the Q verbs would be a really smart way to manipulate data in an open framework. Plus there are some performance guys out there in the open source world that could probably best even Arthur
Microsoft has SCOPE: http://portal.acm.org/citation.cfm?id=1454159.1454166
Yahoo has Pig: http://portal.acm.org/citation.cfm?id=1376726
Google has Sawzall: http://iospress.metapress.com/content/99vjkgkae3jkvu9t/