Guide to Simple Binary Encoding – 简单二进制编码指南

最后修改: 2022年 10月 22日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

Efficiency and performance are two important aspects of modern data services, especially when we stream high amounts of data. Certainly, reducing the message size with a performant encoding is the key to achieving it.

效率和性能是现代数据服务的两个重要方面,特别是当我们流转大量数据时。当然,用高性能的编码减少消息的大小是实现它的关键。

However, in-house encoding/decoding algorithms could be cumbersome and fragile, which makes them hard to maintain in the long run.

然而,内部编码/解码算法可能是繁琐和脆弱的,这使得它们难以长期维护。

Luckily, Simple Binary Encoding can help us implement and maintain a tailor-cut encoding/decoding system in a practical way.

幸运的是,简单二进制编码可以帮助我们以一种实用的方式实现和维护一个量身定做的编码/解码系统。

In this tutorial, we’ll discuss what Simple Binary Encoding (SBE) is for and how to use it alongside code samples.

在本教程中,我们将讨论简单二进制编码(SBE)的作用,以及如何与代码样本一起使用它。

2. What Is SBE?

2.什么是SBE?

SBE is a binary representation for encoding/decoding messages to support low-latency streaming. It’s also the reference implementation of the FIX SBE standard, which is a standard for the encoding of financial data.

SBE是一种用于编码/解码消息的二进制表示,以支持低延迟流。它也是FIX SBE标准的参考实现,该标准是一个金融数据的编码标准。

2.1. The Message Structure

2.1.信息结构

In order to preserve streaming semantics, a message must be capable of being read or written sequentially, with no backtrack. This eliminates extra operations — like dereferencing, handling location pointers, managing additional states, etc. – and utilizes hardware support better to keep maximum performance and efficiency.

为了保留流媒体语义,消息必须能够按顺序读取或写入,没有回溯。这消除了额外的操作–如取消引用、处理位置指针、管理额外状态等。- 并更好地利用硬件支持以保持最大的性能和效率。

Let’s have a peek at how the message is structured in SBE:

让我们偷看一下SBE中的信息结构:

  • Header: It contains mandatory fields like the version of the message. It can also contain more fields when necessary.
  • Root Fields: Static fields of the message. Their block size is predefined and cannot be changed. They can also be defined as optional.
  • Repeating Groups: These represent collection-type presentations. Groups can contain fields and also inner groups to be able to represent more complex structures.
  • Variable Data Fields: These are fields for which we can’t determine their sizes ahead. String and Blob data types are two examples. They’ll be at the end of the message.

Next, we’ll see why this message structure’s important.

接下来,我们将看到为什么这种信息结构很重要。

2.2. When Is SBE (Not) Useful?

2.2.什么时候SBE(不)有用?

The power of SBE originates from its message structure. It’s optimized for sequential access to data. Hence, SBE is well suited for fixed-size data like numbers, bitsets, enums, and arrays.

SBE的力量源于它的消息结构。它为数据的顺序访问进行了优化。因此,SBE非常适用于固定大小的数据,如数字、比特集、枚举和数组

A common use case for SBE is financial data streaming — mostly containing numbers and enums — which SBE is specifically designed for.

SBE的一个常见用例是金融数据流–大多包含数字和枚举–SBE是专门为其设计的。

On the other hand, SBE isn’t well suited for variable-length data types like string and blob. The reason for that is we most likely don’t know the exact data size ahead. Accordingly, this will end up with additional calculations at the streaming time to detect the boundaries of data in a message. Not surprisingly, this can bite our business if we’re talking about milliseconds latency.

另一方面,SBE并不适合字符串和blob等可变长度的数据类型。原因是我们很可能无法提前知道确切的数据大小。因此,这将导致在流媒体时间进行额外的计算以检测消息中的数据边界。毫不奇怪,如果我们谈论的是毫秒级的延迟,这可能会影响到我们的业务。

Although SBE still supports String and Blob data types, they’re always placed at the end of the message to keep the impact of variable length calculations at a minimum.

虽然SBE仍然支持String和Blob数据类型,但它们总是被放在消息的最后,以使变长计算的影响降到最低

3. Setting Up the Library

3.设置图书馆

To use the SBE library, let’s add the following Maven dependency to our pom.xml file:

为了使用SBE库,让我们在我们的pom.xml文件中添加以下Maven 依赖项

<dependency>
    <groupId>uk.co.real-logic</groupId>
    <artifactId>sbe-all</artifactId>
    <version>1.27.0</version>
</dependency>

4. Generating Java Stubs

4.生成Java存根

Before we generate our Java stubs, clearly, we need to form our message schema. SBE provides the ability to define our schemas via XML.

在我们生成我们的Java存根之前,显然,我们需要形成我们的消息模式。SBE提供了通过XML定义我们模式的能力

Next, we’ll see how to define a schema for our message, which transfers sample market trade data.

接下来,我们将看到如何为我们的消息定义一个模式,即传输样本市场交易数据。

4.1. Creating the Message Schema

4.1.创建消息模式

Our schema will be an XML file based on a special XSD of FIX protocol. It will define our message format.

我们的模式将是一个XML文件,它基于FIX协议的一个特殊的XSD。它将定义我们的消息格式。

So, let’s create our schema file:

因此,让我们来创建我们的模式文件。

<?xml version="1.0" encoding="UTF-8"?>
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
  package="com.baeldung.sbe.stub" id="1" version="0" semanticVersion="5.2"
  description="A schema represents stock market data.">
    <types>
        <composite name="messageHeader" 
          description="Message identifiers and length of message root.">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <enum name="Market" encodingType="uint8">
            <validValue name="NYSE" description="New York Stock Exchange">0</validValue>
            <validValue name="NASDAQ" 
              description="National Association of Securities Dealers Automated Quotations">1</validValue>
        </enum>
        <type name="Symbol" primitiveType="char" length="4" characterEncoding="ASCII" 
          description="Stock symbol"/>
        <composite name="Decimal">
            <type name="mantissa" primitiveType="uint64" minValue="0"/>
            <type name="exponent" primitiveType="int8"/>
        </composite>
        <enum name="Currency" encodingType="uint8">
            <validValue name="USD" description="US Dollar">0</validValue>
            <validValue name="EUR" description="Euro">1</validValue>
        </enum>
        <composite name="Quote" 
          description="A quote represents the price of a stock in a market">
            <ref name="market" type="Market"/>
            <ref name="symbol" type="Symbol"/>
            <ref name="price" type="Decimal"/>
            <ref name="currency" type="Currency"/>
        </composite>
    </types>
    <sbe:message name="TradeData" id="1" description="Represents a quote and amount of trade">
        <field name="quote" id="1" type="Quote"/>
        <field name="amount" id="2" type="uint16"/>
    </sbe:message>
</sbe:messageSchema>

If we look at the schema in detail, we’ll notice that it has two main parts, <types> and <sbe:message>. We’ll start defining <types> first.

如果我们详细看一下这个模式,我们会发现它有两个主要部分,<type><sbe:message>。我们将首先开始定义<type>

As our first type, we create the messageHeader. It’s mandatory and also has four mandatory fields:

作为我们的第一个类型,我们创建messageHeader。它是强制性的,也有四个强制性字段。

<composite name="messageHeader" description="Message identifiers and length of message root.">
    <type name="blockLength" primitiveType="uint16"/>
    <type name="templateId" primitiveType="uint16"/>
    <type name="schemaId" primitiveType="uint16"/>
    <type name="version" primitiveType="uint16"/>
</composite>
  • blockLength: represents total space reserved for the root fields in a message. It doesn’t count repeated fields or variable-length fields, like string and blob.
  • templateId: an identifier for the message template.
  • schemaId: an identifier for the message schema. A schema always contains a template.
  • version: the version of the message schema when we define the message.

Next, we define an enumeration, Market:

接下来,我们定义一个枚举,Market

<enum name="Market" encodingType="uint8">
    <validValue name="NYSE" description="New York Stock Exchange">0</validValue>
    <validValue name="NASDAQ" 
      description="National Association of Securities Dealers Automated Quotations">1</validValue>
</enum>

We aim to hold some well-known exchange names, which we can hard-code in the schema file. They don’t change or increase often. Therefore, type <enum> is a good fit here.

我们的目标是持有一些著名的交易所名称,我们可以在模式文件中硬编码。它们不会经常改变或增加。因此,类型<enum>在这里很合适。

By setting encodingType=”uint8″, we reserve 8 bits of space for storing the market name in a single message. This allows us to support 2^8 = 256 different markets (0 to 255) — the size of an unsigned 8-bit integer.

通过设置encodingType=”uint8″,我们保留了8位的空间来存储单个消息中的市场名称。这使我们能够支持2^8 = 256个不同的市场(0到255)–无符号8位整数的大小。

Right after, we define another type, Symbol. This will be a 3 or 4-character string that identifies a financial instrument like AAPL (Apple), MSFT (Microsoft), etc.:

紧接着,我们定义另一种类型,符号。这将是一个3或4个字符的字符串,用于识别金融工具,如AAPL(苹果),MSFT(微软),等等。

<type name="Symbol" primitiveType="char" length="4" characterEncoding="ASCII" description="Instrument symbol"/>

As we see, we limit the characters with characterEncoding=”ASCII” – 7 bits, 128 characters maximum – and we set a cap with length=”4″ to not allow more than 4 characters. Thus, we can reduce the size as much as possible.

正如我们所看到的,我们用characterEncoding=”ASCII”来限制字符–7位,最多128个字符–并且我们用length=”4″设置一个上限,不允许超过4个字符。因此,我们可以尽可能地减少尺寸。

After that, we need a composite type for price data. So, we create the type Decimal:

之后,我们需要一个用于价格数据的复合类型。因此,我们创建了Decimal类型。

<composite name="Decimal">
    <type name="mantissa" primitiveType="uint64" minValue="0"/>
    <type name="exponent" primitiveType="int8"/>
</composite>

Decimal is composed of two types:

十进制由两种类型组成。

  • mantissa: the significant digits of a decimal number
  • exponent: the scale of a decimal number

For example, the values mantissa=98765 and exponent=-3 represent the number 98.765.

例如,值mantissa=98765exponent=-3代表数字98.765。

Next, very similar to Market, we create another <enum> to represent Currency whose values are mapped as uint8:

接下来,与Market非常相似,我们创建另一个<enum>来代表Currency,其值被映射为uint8

<enum name="Currency" encodingType="uint8">
    <validValue name="USD" description="US Dollar">0</validValue>
    <validValue name="EUR" description="Euro">1</validValue>
</enum>

Lastly, we define Quote via composing the other types we created before:

最后,我们通过合成我们之前创建的其他类型来定义Quote

<composite name="Quote" description="A quote represents the price of an instrument in a market">
    <ref name="market" type="Market"/>
    <ref name="symbol" type="Symbol"/>
    <ref name="price" type="Decimal"/>
    <ref name="currency" type="Currency"/>
</composite>

Finally, we completed the type definitions.

最后,我们完成了类型的定义。

However, we still need to define a message. So, let’s define our message, TradeData:

然而,我们仍然需要定义一个消息。因此,让我们定义我们的消息,TradeData

<sbe:message name="TradeData" id="1" description="Represents a quote and amount of trade">
    <field name="quote" id="1" type="Quote"/>
    <field name="amount" id="2" type="uint16"/>
</sbe:message>

Certainly, in terms of types, there are more details we can find from the specification.

当然,就类型而言,我们可以从规范中找到更多细节。

In the next two sections, we’ll discuss how to use our schema to generate the Java code that we eventually use to encode/decode our messages.

在接下来的两节中,我们将讨论如何使用我们的模式来生成我们最终用于编码/解码信息的Java代码。

4.2. Using SbeTool

4.2.使用SbeTool

A straightforward way to generate Java stubs is using the SBE jar file. This runs the utility class SbeTool automatically:

生成Java存根的一个直接方法是使用SBE jar文件。这将自动运行实用类SbeTool

java -jar -Dsbe.output.dir=target/generated-sources/java 
  <local-maven-directory>/repository/uk/co/real-logic/sbe-all/1.26.0/sbe-all-1.26.0.jar 
  src/main/resources/schema.xml

We should pay attention that we must adjust the placeholder <local-maven-directory> with our local Maven path to run the command.

我们应该注意,我们必须将占位符<local-maven-directory>调整为我们的本地Maven路径来运行命令。

After successful generation, we’ll see the generated Java code in the folder target/generated-sources/java.

生成成功后,我们会在target/generated-sources/java文件夹中看到生成的Java代码。

4.3. Use SbeTool With Maven

4.3.在Maven中使用SbeTool

Using SbeTool is easy enough, but we can even make it more practical by integrating it into Maven.

使用SbeTool很容易,但我们甚至可以通过将其集成到Maven中来使其更加实用。

So, let’s add the following Maven plugins to our pom.xml:

因此,让我们在pom.xml中添加以下Maven插件。

<build>
    <plugins>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>1.6.0</version>
            <executions>
                <execution>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>java</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <includeProjectDependencies>false</includeProjectDependencies>
                <includePluginDependencies>true</includePluginDependencies>
                <mainClass>uk.co.real_logic.sbe.SbeTool</mainClass>
                <systemProperties>
                    <systemProperty>
                        <key>sbe.output.dir</key>
                        <value>${project.build.directory}/generated-sources/java</value>
                    </systemProperty>
                </systemProperties>
                <arguments>
                    <argument>${project.basedir}/src/main/resources/schema.xml</argument>
                </arguments>
                <workingDirectory>${project.build.directory}/generated-sources/java</workingDirectory>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>uk.co.real-logic</groupId>
                    <artifactId>sbe-tool</artifactId>
                    <version>1.27.0</version>
                </dependency>
            </dependencies>
        </plugin>
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>build-helper-maven-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <id>add-source</id>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>add-source</goal>
                    </goals>
                    <configuration>
                        <sources>
                            <source>${project.build.directory}/generated-sources/java/</source>
                        </sources>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

As a result, a typical Maven clean install command generates our Java stubs automatically.

因此,典型的Mavenclean install命令会自动生成我们的Java存根

Additionally, we can always have a look at the SBE’s Maven documentation for more configuration options.

此外,我们可以随时看看SBE的Maven文档,了解更多配置选项。

5. Basic Messaging

5.基本信息传递

As we have our Java stubs ready, let’s have a look at how we use them.

我们已经准备好了我们的Java存根,让我们来看看我们如何使用它们.

First of all, we need some data for testing. Thus, we create a class, MarketData:

首先,我们需要一些数据用于测试。因此,我们创建一个类,MarketData

public class MarketData {

    private int amount;
    private double price;
    private Market market;
    private Currency currency;
    private String symbol;

    // Constructor, getters and setters
}

We should notice that our MarketData composes the Market and Currency classes that SBE generated for us.

我们应该注意到,我们的MarketData组成了MarketCurrency类,是SBE为我们生成的。

Next, let’s define a MarketData object to use in our unit test later on:

接下来,让我们定义一个MarketData对象,以后在单元测试中使用。

private MarketData marketData;

@BeforeEach
public void setup() {
    marketData = new MarketData(2, 128.99, Market.NYSE, Currency.USD, "IBM");
}

Since we have a MarketData ready, we’ll see how to write and read it into our TradeData in the next sections.

既然我们已经准备好了MarketData,我们将在接下来的章节中看到如何将其写入和读入我们的TradeData

5.1. Writing a Message

5.1.编写信息

Mostly, we’d like to write our data into a ByteBuffer, so we create a ByteBuffer with an initial capacity alongside our generated encoders, MessageHeaderEncoder, and TradeDataEncoder:

大多数情况下,我们想把数据写入ByteBuffer,所以我们在生成的编码器MessageHeaderEncoderTradeDataEncoder旁边创建一个具有初始容量的ByteBuffer>。

@Test
public void givenMarketData_whenEncode_thenDecodedValuesMatch() {
    // our buffer to write encoded data, initial cap. 128 bytes
    UnsafeBuffer buffer = new UnsafeBuffer(ByteBuffer.allocate(128));
    MessageHeaderEncoder headerEncoder = new MessageHeaderEncoder();
    TradeDataEncoder dataEncoder = new TradeDataEncoder();
    
    // we'll write the rest of the code here
}

Before writing the data, we need to parse our price data into two parts, mantissa and exponent:

在写数据之前,我们需要将我们的价格数据解析成两部分,尾数和指数。

BigDecimal priceDecimal = BigDecimal.valueOf(marketData.getPrice());
int priceMantissa = priceDecimal.scaleByPowerOfTen(priceDecimal.scale()).intValue();
int priceExponent = priceDecimal.scale() * -1;

We should notice that we used BigDecimal for this conversion. It’s always a good practice to use BigDecimal when dealing with monetary values because we don’t want to lose precision.

我们应该注意到,我们在这个转换中使用了BigDecimal在处理货币值时,使用BigDecimal总是一个好的做法,因为我们不想失去精度

Finally, let’s encode and write our TradeData:

最后,让我们对我们的TradeData进行编码和编写。

TradeDataEncoder encoder = dataEncoder.wrapAndApplyHeader(buffer, 0, headerEncoder);
encoder.amount(marketData.getAmount());
encoder.quote()
  .market(marketData.getMarket())
  .currency(marketData.getCurrency())
  .symbol(marketData.getSymbol())
  .price()
    .mantissa(priceMantissa)
    .exponent((byte) priceExponent);

5.2. Reading a Message

5.2.读取信息

To read a message, we’ll use the same buffer instance in which we wrote data. However, we need decoders, MessageHeaderDecoder and TradeDataDecoder, this time:

为了读取消息,我们将使用我们写入数据的同一个缓冲区实例。然而,这次我们需要解码器,MessageHeaderDecoderTradeDataDecoder

MessageHeaderDecoder headerDecoder = new MessageHeaderDecoder();
TradeDataDecoder dataDecoder = new TradeDataDecoder();

Next, we decode our TradeData:

接下来,我们对我们的贸易数据进行解码。

dataDecoder.wrapAndApplyHeader(buffer, 0, headerDecoder);

Similarly, we need to decode our price data from two parts, mantissa, and exponent, in order to get the price data into a double value. Surely, we make use of BigDecimal again:

同样,我们需要将价格数据从两部分解码,即尾数和指数,以便将价格数据转化为double值。当然,我们又利用了BigDecimal

double price = BigDecimal.valueOf(dataDecoder.quote().price().mantissa())
  .scaleByPowerOfTen(dataDecoder.quote().price().exponent())
  .doubleValue();

Finally, let’s ensure our decoded values match the original ones:

最后,让我们确保我们的解码值与原始值相符。

Assertions.assertEquals(2, dataDecoder.amount());
Assertions.assertEquals("IBM", dataDecoder.quote().symbol());
Assertions.assertEquals(Market.NYSE, dataDecoder.quote().market());
Assertions.assertEquals(Currency.USD, dataDecoder.quote().currency());
Assertions.assertEquals(128.99, price);

6. Conclusion

6.结语

In this article, we learned how to set up SBE, define the message structure via XML and use it to encode/decode our messages in Java.

在这篇文章中,我们学习了如何设置SBE,通过XML定义消息结构,并使用它在Java中对消息进行编码/解码。

As always, we can find all the code samples and more over on GitHub.

一如既往,我们可以在GitHub上找到所有的代码样本和更多的内容