[Hive] overview: distributed data warehouse

📚 데이터베이스/빅데이터

써니(>_<) 2022. 8. 1. 22:24

Apache Hive is a fault-tolerant distributed data warehouse that allows for massive-scale analytics.

- Hive is built on top of Apache Hadoop, an open-source platform for storing and processing large amounts of data.

-As a result, Hive is inextricably linked to Hadoop and is designed to process petabytes of data quickly.

- Using SQL, Hive allows users to read, write, and manage petabytes of data.

-Hive is distinguished by its ability to query large datasets with a SQL-like interface utilizing Apache Tez or MapReduce.