HBase with Java – 使用Java的HBase

最后修改: 2017年 3月 7日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we’ll be looking at the HBase database Java Client library. HBase is a distributed database that uses the Hadoop file system for storing data.

在这篇文章中,我们将研究HBase数据库Java客户端库。HBase是一个分布式数据库,使用Hadoop文件系统来存储数据。

We’ll create a Java example client and a table to which we will add some simple records.

我们将创建一个Java实例客户端和一个表,我们将向其添加一些简单的记录。

2. HBase Data Structure

2.HBase数据结构

In HBase, data is grouped into column families. All column members of a column family have the same prefix.

在HBase中,数据被分组为列族。一个列族的所有列成员都有相同的前缀。

For example, the columns family1:qualifier1 and family1:qualifier2 are both members of the family1 column family. All column family members are stored together on the filesystem.

例如,列family1:qualifier1family1:qualifier2都是family1列族的成员。所有的列族成员都一起存储在文件系统上。

Inside the column family, we can put a row that has a specified qualifier. We can think of a qualifier as a kind of the column name.

在列族里面,我们可以放一个有指定限定词的行。我们可以把限定词看作是列名的一种。

Let’s see an example record from Hbase:

让我们看看Hbase的一个记录例子。

Family1:{  
   'Qualifier1':'row1:cell_data',
   'Qualifier2':'row2:cell_data',
   'Qualifier3':'row3:cell_data'
}
Family2:{  
   'Qualifier1':'row1:cell_data',
   'Qualifier2':'row2:cell_data',
   'Qualifier3':'row3:cell_data'
}

We have two column families, each of them has three qualifiers with some cell data in it. Each row has a row key – it is a unique row identifier. We will be using the row key to insert, retrieve and delete the data.

我们有两个列族,每个列族有三个限定词,里面有一些单元格数据。每一行都有一个行键–它是一个唯一的行标识符。我们将使用行键来插入、检索和删除数据。

3. HBase Client Maven Dependency

3.HBase客户端的Maven依赖性

Before we connect to the HBase, we need to add hbase-client and hbase dependencies:

在连接到HBase之前,我们需要添加hbase-client hbase依赖项。

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>${hbase.version}</version>
</dependency>
<dependency>
     <groupId>org.apache.hbase</groupId>
     <artifactId>hbase</artifactId>
     <version>${hbase.version}</version>
</dependency>

4. HBase Setup

4.HBase设置

We need to setup HBase to be able to connect from a Java client library to it. The installation is out of the scope of this article but you can check out some of the HBase installation guides online.

我们需要设置HBase,以便能够从Java客户端库连接到它。安装工作不在本文的讨论范围之内,但你可以在网上查看一些HBase的安装指南

Next, we need to start an HBase master locally by executing:

接下来,我们需要在本地执行一个HBase主站。

hbase master start

5. Connecting to HBase from Java

5.从Java连接到HBase?

To connect programmatically from Java to HBase, we need to define an XML configuration file. We started our HBase instance on localhost so we need to enter that into a configuration file:

为了以编程方式从Java连接到HBase,我们需要定义一个XML配置文件。我们在localhost上启动了我们的HBase实例,所以我们需要在配置文件中输入这个信息。

<configuration>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>localhost</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
</configuration>

Now we need to point an HBase client to that configuration file:

现在我们需要将HBase客户端指向该配置文件。

Configuration config = HBaseConfiguration.create();

String path = this.getClass()
  .getClassLoader()
  .getResource("hbase-site.xml")
  .getPath();
config.addResource(new Path(path));

Next, we’re checking if a connection to HBase was successful – in the case of a failure, the MasterNotRunningException will be thrown:

接下来,我们要检查与HBase的连接是否成功–在失败的情况下,将抛出MasterNotRunningException

HBaseAdmin.checkHBaseAvailable(config);

6. Creating a Database Structure

6.创建一个数据库结构

Before we start adding data to HBase, we need to create the data structure for inserting rows. We will create one table with two column families:

在我们开始向HBase添加数据之前,我们需要创建用于插入行的数据结构。我们将创建一个有两个列族的表。

private TableName table1 = TableName.valueOf("Table1");
private String family1 = "Family1";
private String family2 = "Family2";

Firstly, we need to create a connection to the database and get admin object, which we will use for manipulating a database structure:

首先,我们需要创建一个与数据库的连接,并获得admin对象,我们将用它来操作数据库结构。

Connection connection = ConnectionFactory.createConnection(config)
Admin admin = connection.getAdmin();

Then, we can create a table by passing an instance of the HTableDescriptor class to a createTable() method on the admin object:

然后,我们可以通过将HTableDescriptor类的一个实例传递给admin对象上的createTable()方法来创建一个表。

HTableDescriptor desc = new HTableDescriptor(table1);
desc.addFamily(new HColumnDescriptor(family1));
desc.addFamily(new HColumnDescriptor(family2));
admin.createTable(desc);

7. Adding and Retrieving Elements

7.添加和检索元素

With the table created, we can add new data to it by creating a Put object and calling a put() method on the Table object:

在创建了表之后,我们可以通过创建一个Put 对象并在Table 对象上调用put() 方法来向其添加新数据。

byte[] row1 = Bytes.toBytes("row1")
Put p = new Put(row1);
p.addImmutable(family1.getBytes(), qualifier1, Bytes.toBytes("cell_data"));
table1.put(p);

Retrieving previously created row can be achieved by using a Get class:

检索先前创建的行可以通过使用Get类来实现。

Get g = new Get(row1);
Result r = table1.get(g);
byte[] value = r.getValue(family1.getBytes(), qualifier1);

The row1 is a row identifier – we can use it to retrieve a specific row from the database. When calling:

row1是一个行标识符 – 我们可以用它来从数据库中检索一个特定的行。当调用时。

Bytes.bytesToString(value)

the returned result will be previously the inserted cell_data.

返回的结果将是之前插入的cell_data.

8. Scanning and Filtering

8.扫描和过滤

We can scan the table, retrieving all elements inside of a given qualifier by using a Scan object (note that ResultScanner extends Closable, so be sure to call close() on it when you’re done):

我们可以扫描该表,通过使用Scan对象来检索给定限定符内的所有元素(注意,ResultScanner扩展了Closable,所以当你完成后一定要对它调用close())。

Scan scan = new Scan();
scan.addColumn(family1.getBytes(), qualifier1);

ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    System.out.println("Found row: " + result);
}

That operation will print all rows inside of a qualifier1 with some additional information like timestamp:

该操作将打印qualifier1内的所有行,以及一些额外的信息,如时间戳。

Found row: keyvalues={Row1/Family1:Qualifier1/1488202127489/Put/vlen=9/seqid=0}

We can retrieve specific records by using filters.

我们可以通过使用过滤器来检索特定的记录。

Firstly, we are creating two filters. The filter1 specifies that scan query will retrieve elements that are greater than row1, and filter2 specifies that we are interested only in rows that have a qualifier equal to qualifier1:

首先,我们要创建两个过滤器。filter1指定扫描查询将检索大于row1的元素,filter2指定我们只对限定符等于qualifier1的行感兴趣。

Filter filter1 = new PrefixFilter(row1);
Filter filter2 = new QualifierFilter(
  CompareOp.GREATER_OR_EQUAL, 
  new BinaryComparator(qualifier1));
List<Filter> filters = Arrays.asList(filter1, filter2);

Then we can get a result set from a Scan query:

然后我们可以从扫描查询中获得一个结果集。

Scan scan = new Scan();
scan.setFilter(new FilterList(Operator.MUST_PASS_ALL, filters));

try (ResultScanner scanner = table.getScanner(scan)) {
    for (Result result : scanner) {
        System.out.println("Found row: " + result);
    }
}

When creating a FilterList we passed an Operator.MUST_PASS_ALL – it means that all filters must be satisfied. We can choose an Operation.MUST_PASS_ONE if only one filter needs to be satisfied. In the resulting set, we will have only rows that matched specified filters.

在创建FilterList时,我们传递了一个Operator.MUST_PASS_ALL–它意味着必须满足所有的过滤器。如果只有一个过滤器需要被满足,我们可以选择Operator.MUST_PASS_ONE。在产生的集合中,我们将只有符合指定过滤器的行。

9. Deleting Rows

9.删除行

Finally, to delete a row, we can use a Delete class:

最后,为了删除一行,我们可以使用Delete类。

Delete delete = new Delete(row1);
delete.addColumn(family1.getBytes(), qualifier1);
table.delete(delete);

We’re deleting a row1 that resides inside of a family1.

我们正在删除位于family1.内的row1

10. Conclusion

10.结论

In this quick tutorial, we focused on communicated with a HBase database. We saw how to connect to HBase from the Java client library and how to run various basic operations.

在这个快速教程中,我们重点讨论了与HBase数据库的交流。我们看到了如何从Java客户端库连接到HBase,以及如何运行各种基本操作。

The implementation of all these examples and code snippets can be found in the GitHub project; this is a Maven project, so it should be easy to import and run as it is.

所有这些例子和代码片段的实现都可以在GitHub项目中找到;这是一个Maven项目,所以应该很容易导入并按原样运行。