Post by simranratry20244 on Feb 12, 2024 4:24:54 GMT -5
We are often presented with Hadoop , the new open source technology prepared to store, index and analyze huge volumes of data that are difficult to manage (due to their complex variety, compromised or dubious quality...), as an alternative to databases. Traditional data structured according to the relational model . And, certainly, the limitations that relational databases present and that Hadoop is capable of solving are notable and important, especially given the new needs for data storage, analysis and management ; However, this does not mean that Hadoop is a tool capable of replacing the relational model, but quite the opposite: taking into account the limits that, in turn and in a different sense, Hadoop also presents, it is easy to conclude, as we will prove shortly. , that the relationship between the two is in no way exclusive , but rather complementary.
Hadoop and the relational model: limitations and convergences In order to consider Hadoop (both in its open source version and the different distributions of advanced data management software that include this technology) as an optimal complement to relational databases, we must first understand that they are perfectly compatible each other: the use of Hadoop, therefore, does not Colombia Telemarketing Data mean giving up a data warehousing strategy based on the use of relational databases . A convergence that becomes evident after making visible the limitations that, respectively, each case presents. For the relational model, we could summarize its disadvantages as follows, linking them to the exponential increase in the volume, variety and complexity of the data that a company needs to store and manage for subsequent analysis: The management cost increases substantially and directly proportional to the amount of new data implemented in corporate data warehouses , becoming unsustainable for a good number of organizations. The increase in the number of data sources (and their flow) that constantly feed data warehouses forces us to establish hierarchies, to order data according to a priority that in many cases is highly hypothetical, and that relegates data to the background.
Likely to have priority relevance in possible future consultations. All of this is solved with Hadoop, a technology that considers all data equally and reduces acquisition, management and maintenance costs to practically zero. However, at the same time, Hadoop is not advisable to cover certain management needs , such as (among others) the modification of data housed in data warehouses through updates, new insertions or the deletion of data previously housed in them, assumptions in which resorting to a relational system is much more appropriate. We must not forget that Hadoop is a technology on which a set of tools developed to manage large volumes of data, of great variety and complexity, is based, but that in no way represents a substitute for the relational model for structuring databases. What's more, there are numerous tools that facilitate their increasingly necessary convergence. In this sense, the guide Optimizing analytical environments with Big Data , in addition to being a completely free resource, allows you to delve deeper into the issue and understand to what extent the relational model is compatible with Hadoop technology . This is the case of IBM BigInsights , an analysis tool based on Hadoop, which improves it by making it ready for the business user (“Enterprise Ready”). It is an analysis tool in spreadsheet format, so its use is very simple and the learning curve is minimal.