What is IOE? I=IBM, O=Oracle, and E=EMC. They
represent the typical high-end database and data warehouse architecture. The
high-end servers include HP, IBM, and Fujitsu, the high-end database software includes Teradata, Oracle, Greenplum;
the high-end storages include EMC, Violin, and Fusion-io.
In the
past, such typical high performance database architecture is the preference of
large and middle sized organizations. They can run stably with superior
performance, and became popular when the informatization degree was not so high and the enterprise application was simple. With
the explosive data growth and the nowadays diversified and complex enterprise
applications, most enterprises have gradually realized that they should
replacing IOE, and quite a few of them have successfully implemented their road
map to cancel the high-end database totally, including Intel, Alibaba, Amazon,
eBay, Yahoo, and Facebook.
The data explosion has brought about sharp increase in the storage
capacity demand, and the diversified and complex applications pose the
challenge to meet the fast-growing computation pressure and parallel access
requests. The only solution is to upgrade even more frequently. More and more enterprise managements get to feel the
pressure of the great cost to upgrade IOE. More often than not, enterprises
still suffer from the slow response and high workloads even if they've invested
heavily. That is why these enterprises are determined to replace IOE.
Hadoop is one of the IOE solutions on which the enterprise management
have pinned great hope.
It supports the cheap desktop hard disk as a replacement to high-end
storage media of IOE.
Its HDFS file system can replace the disk cabinet of IOE, ensuring the
secure data redundancy.
It supports the cheap PC to replace the high-end database server.
It is the open source software, not incurring any cost on additional
CPUs, storage capacities, and user licenses.
With the support for parallel computing, the inexpensive scale-out can be
implemented, and the storage pressure can be averted to multiple inexpensive
PCs at less acquisition and management cost, so as to have greater storage
capacity, higher computing performance, and a number of paralleling processes
far more than that of IOE. That's why Hadoop is highly anticipated.
However, Hadoop’s structured data computation still
cannot reach that level as IOE did, especially in relational database led by Oracle. The data computing is the most important software function for the
modern enterprise data center. Nowadays, it is normal to find some data
computing involving the complex business logics, in particular the applications
of enterprise decision-making, procedure optimizing, performance benchmarking,
time control, and cost management. However, Hadoop alone cannot replace IOE. As
a matter of facts, those enterprises of high-profile champions for replacing
IOE have to partly keep the IOE. With the drawback of insufficient computing
capability, Hadoop can only be used to compute the simple ETL, data storage and
locating, and is awkward to handle the truly massive business data computation.
To replace IOE, we need to have the computational capability no weaker
than the enterprise-level database and seamlessly incorporating this capability
to Hadoop to give full play to the advantageous computing solution of Hadoop. esProc can meet
this demand.
esProc is a parallel computing middleware which
is built with pure Java and focused on powering Hadoop. It can access Hive via
JDBC or directly read and write to HDFS. With the complete data computing
system, esProc
can replace the most data computing ability of IOE in a simpler way. It is especially good at the computation requiring complex business
logics and stored procedures.
esProc supports the professional data scripting languages, offering the
true set data type, easy for algorithm design from business client's
perspective, and effortless to implement the complex business logics. In
addition, esProc supports the
ordered set for arbitrary access to the member of set and perform the
serial-number-related computation. The set of set can be used to represent the
complex grouping style easily, for example, the equal grouping, align grouping,
and enum grouping. esProc
also provides the complete code editing
and debugging functions. It can be
regarded as a dynamic set-lized language which has something in common with R
language, and offers native support for distributed parallel computation from
the core. Programmers can surely be benefited from the efficient parallel
computation of esProc while still having the simple syntax of R. It is built
for the data computing, and optimized for data processing. For the complex
analysis business, both its development efficiency and computing performance
are beyond the existing solution of Hadoop in structured data computing.
The combined use of Hadoop + esProc can fully remedy the drawback to
Hadoop, empowering Hadoop to replace the very most of IOE features and
improving its computing capability dramatically.
About esProc: http://www.raqsoft.com/product-esproc
没有评论:
发表评论