A Guide to Cassandra with Java – 使用Java的Cassandra指南

最后修改: 2016年 10月 20日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

This tutorial is an introductory guide to the Apache Cassandra database using Java.

本教程是使用Java对Apache Cassandra数据库的介绍性指南。

You will find key concepts explained, along with a working example that covers the basic steps to connect to and start working with this NoSQL database from Java.

你会发现关键概念得到了解释,同时还有一个工作实例,涵盖了从Java连接到这个NoSQL数据库并开始工作的基本步骤。

2. Cassandra

2.卡桑德拉

Cassandra is a scalable NoSQL database that provides continuous availability with no single point of failure and gives the ability to handle large amounts of data with exceptional performance.

Cassandra是一个可扩展的NoSQL数据库,它提供连续的可用性,没有单点故障,并赋予处理大量数据的能力,性能卓越。

This database uses a ring design instead of using a master-slave architecture. In the ring design, there is no master node – all participating nodes are identical and communicate with each other as peers.

这个数据库使用环形设计,而不是使用主从结构。在环形设计中,没有主节点–所有参与的节点都是相同的,并作为对等体相互通信。

This makes Cassandra a horizontally scalable system by allowing for the incremental addition of nodes without needing reconfiguration.

这使得Cassandra成为一个水平可扩展的系统,允许增量增加节点而不需要重新配置。

2.1. Key Concepts

2.1.关键概念

Let’s start with a short survey of some of the key concepts of Cassandra:

让我们先对Cassandra的一些关键概念做一个简短的调查。

  • Cluster – a collection of nodes or Data Centers arranged in a ring architecture. A name must be assigned to every cluster, which will subsequently be used by the participating nodes
  • Keyspace – If you are coming from a relational database, then the schema is the respective keyspace in Cassandra. The keyspace is the outermost container for data in Cassandra. The main attributes to set per keyspace are the Replication Factor, the Replica Placement Strategy and the Column Families
  • Column Family – Column Families in Cassandra are like tables in Relational Databases. Each Column Family contains a collection of rows which are represented by a Map<RowKey, SortedMap<ColumnKey, ColumnValue>>. The key gives the ability to access related data together
  • Column – A column in Cassandra is a data structure which contains a column name, a value and a timestamp. The columns and the number of columns in each row may vary in contrast with a relational database where data are well structured

3. Using the Java Client

3.使用Java客户端

3.1. Maven Dependency

3.1.Maven的依赖性

We need to define the following Cassandra dependency in the pom.xml, the latest version of which can be found here:

我们需要在pom.xml中定义以下Cassandra依赖,其最新版本可以在这里找到。

<dependency>
    <groupId>com.datastax.cassandra</groupId>
    <artifactId>cassandra-driver-core</artifactId>
    <version>3.1.0</version>
</dependency>

In order to test the code with an embedded database server we should also add the cassandra-unit dependency, the latest version of which can be found here:

为了用嵌入式数据库服务器测试代码,我们还应该添加cassandra-unit依赖项,其最新版本可以在这里找到。

<dependency>
    <groupId>org.cassandraunit</groupId>
    <artifactId>cassandra-unit</artifactId>
    <version>3.0.0.1</version>
</dependency>

3.2. Connecting to Cassandra

3.2.连接到Cassandra

In order to connect to Cassandra from Java, we need to build a Cluster object.

为了从Java连接到Cassandra,我们需要建立一个Cluster对象。

An address of a node needs to be provided as a contact point. If we don’t provide a port number, the default port (9042) will be used.

需要提供一个节点的地址作为联系点。如果我们不提供端口号,将使用默认端口(9042)。

These settings allow the driver to discover the current topology of a cluster.

这些设置允许驱动程序发现集群的当前拓扑结构。

public class CassandraConnector {

    private Cluster cluster;

    private Session session;

    public void connect(String node, Integer port) {
        Builder b = Cluster.builder().addContactPoint(node);
        if (port != null) {
            b.withPort(port);
        }
        cluster = b.build();

        session = cluster.connect();
    }

    public Session getSession() {
        return this.session;
    }

    public void close() {
        session.close();
        cluster.close();
    }
}

3.3. Creating the Keyspace

3.3.创建钥匙空间

Let’s create our “library” keyspace:

让我们创建我们的”library“钥匙空间。

public void createKeyspace(
  String keyspaceName, String replicationStrategy, int replicationFactor) {
  StringBuilder sb = 
    new StringBuilder("CREATE KEYSPACE IF NOT EXISTS ")
      .append(keyspaceName).append(" WITH replication = {")
      .append("'class':'").append(replicationStrategy)
      .append("','replication_factor':").append(replicationFactor)
      .append("};");
        
    String query = sb.toString();
    session.execute(query);
}

Except from the keyspaceName we need to define two more parameters, the replicationFactor and the replicationStrategy. These parameters determine the number of replicas and how the replicas will be distributed across the ring, respectively.

除了keyspaceName,我们还需要定义两个参数,replicationFactorreplicationStrategy。这些参数分别决定了复制的数量和复制在环上的分配方式。

With replication Cassandra ensures reliability and fault tolerance by storing copies of data in multiple nodes.

通过复制,Cassandra通过在多个节点上存储数据的副本来确保可靠性和容错性。

At this point we may test that our keyspace has successfully been created:

在这一点上,我们可以测试一下,我们的钥匙空间已经成功创建。

private KeyspaceRepository schemaRepository;
private Session session;

@Before
public void connect() {
    CassandraConnector client = new CassandraConnector();
    client.connect("127.0.0.1", 9142);
    this.session = client.getSession();
    schemaRepository = new KeyspaceRepository(session);
}
@Test
public void whenCreatingAKeyspace_thenCreated() {
    String keyspaceName = "library";
    schemaRepository.createKeyspace(keyspaceName, "SimpleStrategy", 1);

    ResultSet result = 
      session.execute("SELECT * FROM system_schema.keyspaces;");

    List<String> matchedKeyspaces = result.all()
      .stream()
      .filter(r -> r.getString(0).equals(keyspaceName.toLowerCase()))
      .map(r -> r.getString(0))
      .collect(Collectors.toList());

    assertEquals(matchedKeyspaces.size(), 1);
    assertTrue(matchedKeyspaces.get(0).equals(keyspaceName.toLowerCase()));
}

3.4. Creating a Column Family

3.4.创建一个列族

Now, we can add the first Column Family “books” to the existing keyspace:

现在,我们可以将第一个列族 “书籍 “添加到现有的钥匙空间。

private static final String TABLE_NAME = "books";
private Session session;

public void createTable() {
    StringBuilder sb = new StringBuilder("CREATE TABLE IF NOT EXISTS ")
      .append(TABLE_NAME).append("(")
      .append("id uuid PRIMARY KEY, ")
      .append("title text,")
      .append("subject text);");

    String query = sb.toString();
    session.execute(query);
}

The code to test that the Column Family has been created, is provided below:

下面提供了测试列族已被创建的代码。

private BookRepository bookRepository;
private Session session;

@Before
public void connect() {
    CassandraConnector client = new CassandraConnector();
    client.connect("127.0.0.1", 9142);
    this.session = client.getSession();
    bookRepository = new BookRepository(session);
}
@Test
public void whenCreatingATable_thenCreatedCorrectly() {
    bookRepository.createTable();

    ResultSet result = session.execute(
      "SELECT * FROM " + KEYSPACE_NAME + ".books;");

    List<String> columnNames = 
      result.getColumnDefinitions().asList().stream()
      .map(cl -> cl.getName())
      .collect(Collectors.toList());
        
    assertEquals(columnNames.size(), 3);
    assertTrue(columnNames.contains("id"));
    assertTrue(columnNames.contains("title"));
    assertTrue(columnNames.contains("subject"));
}

3.5. Altering the Column Family

3.5.改变列族

A book has also a publisher, but no such column can be found in the created table. We can use the following code to alter the table and add a new column:

一本书也有一个出版商,但在创建的表中找不到这样的列。我们可以使用下面的代码来改变该表并添加一个新的列。

public void alterTablebooks(String columnName, String columnType) {
    StringBuilder sb = new StringBuilder("ALTER TABLE ")
      .append(TABLE_NAME).append(" ADD ")
      .append(columnName).append(" ")
      .append(columnType).append(";");

    String query = sb.toString();
    session.execute(query);
}

Let’s make sure that the new column publisher has been added:

让我们确保新的列publisher已经被添加。

@Test
public void whenAlteringTable_thenAddedColumnExists() {
    bookRepository.createTable();

    bookRepository.alterTablebooks("publisher", "text");

    ResultSet result = session.execute(
      "SELECT * FROM " + KEYSPACE_NAME + "." + "books" + ";");

    boolean columnExists = result.getColumnDefinitions().asList().stream()
      .anyMatch(cl -> cl.getName().equals("publisher"));
        
    assertTrue(columnExists);
}

3.6. Inserting Data in the Column Family

3.6.在列族中插入数据

Now that the books table has been created, we are ready to start adding data to the table:

现在,books表已经被创建,我们准备开始向该表添加数据。

public void insertbookByTitle(Book book) {
    StringBuilder sb = new StringBuilder("INSERT INTO ")
      .append(TABLE_NAME_BY_TITLE).append("(id, title) ")
      .append("VALUES (").append(book.getId())
      .append(", '").append(book.getTitle()).append("');");

    String query = sb.toString();
    session.execute(query);
}

A new row has been added in the ‘books’ table, so we can test if the row exists:

在’书籍’表中增加了一条新行,所以我们可以测试该行是否存在。

@Test
public void whenAddingANewBook_thenBookExists() {
    bookRepository.createTableBooksByTitle();

    String title = "Effective Java";
    Book book = new Book(UUIDs.timeBased(), title, "Programming");
    bookRepository.insertbookByTitle(book);
        
    Book savedBook = bookRepository.selectByTitle(title);
    assertEquals(book.getTitle(), savedBook.getTitle());
}

In the test code above we have used a different method to create a table named booksByTitle:

在上面的测试代码中,我们使用了不同的方法来创建一个名为booksByTitle的表:

public void createTableBooksByTitle() {
    StringBuilder sb = new StringBuilder("CREATE TABLE IF NOT EXISTS ")
      .append("booksByTitle").append("(")
      .append("id uuid, ")
      .append("title text,")
      .append("PRIMARY KEY (title, id));");

    String query = sb.toString();
    session.execute(query);
}

In Cassandra one of the best practices is to use one-table-per-query pattern. This means, for a different query a different table is needed.

在Cassandra中,最好的做法之一是使用每个查询一个表的模式。这意味着,对于不同的查询,需要不同的表。

In our example, we have chosen to select a book by its title. In order to satisfy the selectByTitle query, we have created a table with a compound PRIMARY KEY using the columns, title and id. The column title is the partitioning key while the id column is the clustering key.

在我们的例子中,我们选择通过书名来选择一本书。为了满足selectByTitle查询,我们用titleid列创建了一个带有复合PRIMARY KEY的表。列title是分区键,而id列是聚类键。

This way, many of the tables in your data model contain duplicate data. This is not a downside of this database. On the contrary, this practice optimizes the performance of the reads.

这样一来,你的数据模型中的许多表都包含重复的数据。这并不是这个数据库的缺点。相反,这种做法优化了读取的性能。

Let’s see the data that are currently saved in our table:

让我们看看目前保存在我们表中的数据。

public List<Book> selectAll() {
    StringBuilder sb = 
      new StringBuilder("SELECT * FROM ").append(TABLE_NAME);

    String query = sb.toString();
    ResultSet rs = session.execute(query);

    List<Book> books = new ArrayList<Book>();

    rs.forEach(r -> {
        books.add(new Book(
          r.getUUID("id"), 
          r.getString("title"),  
          r.getString("subject")));
    });
    return books;
}

A test for query returning expected results:

对返回预期结果的查询进行测试。

@Test
public void whenSelectingAll_thenReturnAllRecords() {
    bookRepository.createTable();
        
    Book book = new Book(
      UUIDs.timeBased(), "Effective Java", "Programming");
    bookRepository.insertbook(book);
      
    book = new Book(
      UUIDs.timeBased(), "Clean Code", "Programming");
    bookRepository.insertbook(book);
        
    List<Book> books = bookRepository.selectAll(); 
        
    assertEquals(2, books.size());
    assertTrue(books.stream().anyMatch(b -> b.getTitle()
      .equals("Effective Java")));
    assertTrue(books.stream().anyMatch(b -> b.getTitle()
      .equals("Clean Code")));
}

Everything is fine till now, but one thing has to be realized. We started working with table books, but in the meantime, in order to satisfy the select query by title column, we had to create another table named booksByTitle.

到目前为止,一切都很好,但有一件事必须要意识到。我们开始使用表books,,但与此同时,为了满足按title列进行的select查询,我们不得不创建另一个名为booksByTitle的表。

The two tables are identical containing duplicated columns, but we have only inserted data in the booksByTitle table. As a consequence, data in two tables is currently inconsistent.

这两个表是相同的,包含重复的列,但我们只在booksByTitle表中插入了数据。因此,目前两个表中的数据是不一致的。

We can solve this using a batch query, which comprises two insert statements, one for each table. A batch query executes multiple DML statements as a single operation.

我们可以使用batch查询来解决这个问题,它包括两个插入语句,每个表一个。一个batch查询将多个DML语句作为一个单一操作来执行。

An example of such query is provided:

这里提供了一个此类查询的例子。

public void insertBookBatch(Book book) {
    StringBuilder sb = new StringBuilder("BEGIN BATCH ")
      .append("INSERT INTO ").append(TABLE_NAME)
      .append("(id, title, subject) ")
      .append("VALUES (").append(book.getId()).append(", '")
      .append(book.getTitle()).append("', '")
      .append(book.getSubject()).append("');")
      .append("INSERT INTO ")
      .append(TABLE_NAME_BY_TITLE).append("(id, title) ")
      .append("VALUES (").append(book.getId()).append(", '")
      .append(book.getTitle()).append("');")
      .append("APPLY BATCH;");

    String query = sb.toString();
    session.execute(query);
}

Again we test the batch query results like so:

我们再一次这样测试批量查询的结果。

@Test
public void whenAddingANewBookBatch_ThenBookAddedInAllTables() {
    bookRepository.createTable();
        
    bookRepository.createTableBooksByTitle();
    
    String title = "Effective Java";
    Book book = new Book(UUIDs.timeBased(), title, "Programming");
    bookRepository.insertBookBatch(book);
    
    List<Book> books = bookRepository.selectAll();
    
    assertEquals(1, books.size());
    assertTrue(
      books.stream().anyMatch(
        b -> b.getTitle().equals("Effective Java")));
        
    List<Book> booksByTitle = bookRepository.selectAllBookByTitle();
    
    assertEquals(1, booksByTitle.size());
    assertTrue(
      booksByTitle.stream().anyMatch(
        b -> b.getTitle().equals("Effective Java")));
}

Note: As of version 3.0, a new feature called “Materialized Views” is available , which we may use instead of batch queries. A well-documented example for “Materialized Views” is available here.

注意。从3.0版本开始,一个名为 “物化视图 “的新功能可用,我们可以用它来代替batch查询。关于 “物化视图 “的一个有据可查的例子是这里

3.7. Deleting the Column Family

3.7.删除列族

The code below shows how to delete a table:

下面的代码显示了如何删除一个表。

public void deleteTable() {
    StringBuilder sb = 
      new StringBuilder("DROP TABLE IF EXISTS ").append(TABLE_NAME);

    String query = sb.toString();
    session.execute(query);
}

Selecting a table that does not exist in the keyspace results in an InvalidQueryException: unconfigured table books:

选择一个在钥匙空间中不存在的表会导致InvalidQueryException: unconfigured table books

@Test(expected = InvalidQueryException.class)
public void whenDeletingATable_thenUnconfiguredTable() {
    bookRepository.createTable();
    bookRepository.deleteTable("books");
       
    session.execute("SELECT * FROM " + KEYSPACE_NAME + ".books;");
}

3.8. Deleting the Keyspace

3.8.删除钥匙空间

Finally, let’s delete the keyspace:

最后,让我们删除钥匙空间。

public void deleteKeyspace(String keyspaceName) {
    StringBuilder sb = 
      new StringBuilder("DROP KEYSPACE ").append(keyspaceName);

    String query = sb.toString();
    session.execute(query);
}

And test that the keyspace has been deleted:

并测试该钥匙空间是否已被删除。

@Test
public void whenDeletingAKeyspace_thenDoesNotExist() {
    String keyspaceName = "library";
    schemaRepository.deleteKeyspace(keyspaceName);

    ResultSet result = 
      session.execute("SELECT * FROM system_schema.keyspaces;");
    boolean isKeyspaceCreated = result.all().stream()
      .anyMatch(r -> r.getString(0).equals(keyspaceName.toLowerCase()));
        
    assertFalse(isKeyspaceCreated);
}

4. Conclusion

4.结论

This tutorial covered the basic steps of connecting to and using the Cassandra database with Java. Some of the key concepts of this database have also been discussed in order to help you kick start.

本教程涵盖了用Java连接和使用Cassandra数据库的基本步骤。还讨论了这个数据库的一些关键概念,以帮助你启动。

The full implementation of this tutorial can be found in the Github project.

本教程的完整实现可以在Github项目中找到。