Big Data Round-Up: Unbounded Data, Cloudera Gets Competition and More
This week the GigOM event Structure Big Data took place in New York. We’ve already told you about the announcement of DataStax’s Hadoop distribution, Brisk, and about the launch of our own Pete Warden’s Data Science Toolkit. Here are a few more big data stories that you may have missed this week.
Should We Replace the Term Big Data with “Unbounded Data”?
This is actually from a couple weeks ago, but I think it’s worthy of inclusion. Clive Longbottom of Quocirca makes the case that “Big Data” is the wrong way to talk about the changes in the ways we store, manage and process data. The term certainly gets thrown around a lot, and in many cases for talking about managing data that is much smaller than the petabytes of data that arguably defines big data. Longbottom suggests the term “unbounded data”:
Indeed, in some cases, this is far more of a “little data” issue than a “big data” one. For example, some information may be so esoteric that there are only a hundred or so references that can be trawled. Once these instances have been found, analysing them and reporting on them does not require much In the way of computer power; creating the right terms of reference to find them may well be the biggest issue.
Hadapt and Mapr Take on Cloudera
Mapr is a new Hadoop vendor and competitor to Cloudera co-founded by ex-Googler M.C. Srinivas. Mapr announced that it is releasing its own enterprise Hadoop distribution that uses its own proprietary replacement for the HDFS file system. In addition to Cloudera, Mapr will compete with DataStax and Appistry.
Hadapt is a new company attempting to bring SQL-like relational database capabilities to Hadoop. It leaves the HDFS file system intact and uses HBase.
For more about the heating up of the Hadoop market, don’t miss Derrick Harris’ coverage at GigaOM.
Tokutek Updates Its MySQL-based Big Database
Don’t count MySQL out of the big data quite yet. Tools like HandlerSocket (coverage) and Memcached help the venerable DB scale. So does TokuDB from Tokutek, a storage engine used by companies like Kayak to scale-up MySQL and MariaDB while maintaining ACID compliance.
The new version adds hot indexing, for building queries on the fly, and hot column addition and deletion for managing columns without restarting the database.
The Dark Side of Big Data
Computerworld covers the relationship between surveillance and big data at the conference. “It will change our existing notions of privacy. A surveillance society is not only inevitable, it’s worse. It’s irresistible,” Jeff Jonas, chief scientist of Entity Analytic Solutions at IBM, told Computerworld.
we covered this issue last year and asked what developers would do with access to the massive data sets location aware services enable. It’s still an open question.
For more background on Jonas’ analytics work, check out this InfoWorld piece.
Lead image by nasa1fan/MSFC
Disclosure: IBM is a ReadWriteWeb sponsor.
- Cloudera Releases New Version of Its Apache Hadoop Distribution as Competition Mounts
- Pentaho Opens Up Its Big Data Tools
- Oracle, Cloudera Team Up On Hadoop Appliance
- Cloudera Lands $25 Million For Hadoop Distribution To The Enterprise
- Cloudera Updates Enterprise Offering; Debuts Quick Apache Hadoop Deployment Software