1. Introduction

1.绪论

Despite being one of the best-known vulnerabilities, SQL Injection continues to rank on the top spot of the infamous OWASP Top 10’s list – now part of the more general Injection class.

尽管是最知名的漏洞之一，但SQL 注入仍然排在臭名昭著的OWASP TOP 10’s list的首位 – 现在属于更普遍的Injection类中。

In this tutorial, we’ll explore common coding mistakes in Java that lead to a vulnerable application and how to avoid them using the APIs available in the JVM’s standard runtime library. We’ll also cover what protections we can get out of ORMs like JPA, Hibernate and others and which blind spots we’ll still have to worry about.

在本教程中，我们将探讨Java中导致脆弱应用程序的常见编码错误以及如何使用JVM标准运行时库中的API来避免这些错误。我们还将介绍我们可以从JPA、Hibernate等ORM中获得哪些保护，以及我们仍需担心哪些盲点。

2. How Applications Become Vulnerable to SQL Injection?

2.应用程序如何变得容易受到SQL注入的影响？

Injection attacks work because, for many applications, the only way to execute a given computation is to dynamically generate code that is in turn run by another system or component. If in the process of generating this code we use untrusted data without proper sanitization, we leave an open door for hackers to exploit.

注入式攻击之所以奏效，是因为对于许多应用程序来说，执行特定计算的唯一方法是动态生成代码，然后由另一个系统或组件运行。如果在生成这段代码的过程中，我们使用不受信任的数据而不进行适当的消毒，我们就会为黑客留下一扇开放的大门，让他们利用。

This statement may sound a bit abstract, so let’s take look at how this happens in practice with a textbook example:

这句话听起来可能有点抽象，所以让我们用一个教科书上的例子来看看这在实践中是如何发生的。

public List<AccountDTO>
  unsafeFindAccountsByCustomerId(String customerId)
  throws SQLException {
    // UNSAFE !!! DON'T DO THIS !!!
    String sql = "select "
      + "customer_id,acc_number,branch_id,balance "
      + "from Accounts where customer_id = '"
      + customerId 
      + "'";
    Connection c = dataSource.getConnection();
    ResultSet rs = c.createStatement().executeQuery(sql);
    // ...
}

The problem with this code is obvious: we’ve put the customerId‘s value into the query with no validation at all. Nothing bad will happen if we’re sure that this value will only come from trusted sources, but can we?

这段代码的问题很明显。我们将customerId的值放入查询中，完全没有进行验证。如果我们确信这个值只来自可信任的来源，就不会发生什么坏事，但我们能做到吗？

Let’s imagine that this function is used in a REST API implementation for an account resource. Exploiting this code is trivial: all we have to do is to send a value that, when concatenated with the fixed part of the query, change its intended behavior:

让我们想象一下，这个函数被用于一个账户资源的REST API实现中。利用这段代码是微不足道的：我们所要做的就是发送一个值，当与查询的固定部分相连接时，改变其预期行为。

curl -X GET \
  'http://localhost:8080/accounts?customerId=abc%27%20or%20%271%27=%271' \

Assuming the customerId parameter value goes unchecked until it reaches our function, here’s what we’d receive:

假设customerId参数值在到达我们的函数之前不被选中，下面是我们会收到的信息。

abc' or '1' = '1

When we join this value with the fixed part, we get the final SQL statement that will be executed:

当我们把这个值与固定部分连接起来时，我们就得到了将被执行的最终SQL语句。

select customer_id, acc_number,branch_id, balance
  from Accounts where customerId = 'abc' or '1' = '1'

Probably not what we’ve wanted…

可能不是我们想要的……。

A smart developer (aren’t we all?) would now be thinking: “That’s silly! I’d never use string concatenation to build a query like this”.

一个聪明的开发者（我们不都是吗？）现在会想。”这太傻了!我从来不会使用字符串连接法来建立这样的查询”。

Not so fast… This canonical example is silly indeed but there are situations where we might still need to do it:

不要这么快……这个典型的例子确实很傻，但有些情况下我们可能仍然需要这样做。

Complex queries with dynamic search criteria: adding UNION clauses depending on user-supplied criteria
Dynamic grouping or ordering: REST APIs used as a backend to a GUI data table

2.1. I’m Using JPA. I’m Safe, Right?

2.1.我正在使用JPA 我很安全，对吗？

This is a common misconception. JPA and other ORMs relieves us from creating hand-coded SQL statements, but they won’t prevent us from writing vulnerable code.

这是个常见的误解。JPA和其他ORM使我们不必创建手工编码的SQL语句，但它们不会阻止我们编写脆弱的代码。

Let’s see how the JPA version of the previous example looks:

让我们看看前面例子的JPA版本是怎样的。

public List<AccountDTO> unsafeJpaFindAccountsByCustomerId(String customerId) {    
    String jql = "from Account where customerId = '" + customerId + "'";        
    TypedQuery<Account> q = em.createQuery(jql, Account.class);        
    return q.getResultList()
      .stream()
      .map(this::toAccountDTO)
      .collect(Collectors.toList());        
}

The same issue we’ve pointed before is also present here: we’re using unvalidated input to create a JPA query, so we’re exposed to the same kind of exploit here.

我们之前指出的问题在这里也存在。我们使用未经验证的输入来创建JPA查询，所以我们在这里暴露于同样的漏洞。

3. Prevention Techniques

3.预防技术

Now that we know what a SQL injection is, let’s see how we can protect our code from this kind of attack. Here we’re focusing on a couple of very effective techniques available in Java and other JVM languages, but similar concepts are available to other environments, such as PHP, .Net, Ruby and so forth.

现在我们知道了什么是SQL注入，让我们看看如何保护我们的代码免受这种攻击。在这里，我们将重点讨论Java和其他JVM语言中的一些非常有效的技术，但类似的概念也适用于其他环境，如PHP、.Net、Ruby等。

For those looking for a complete list of available techniques, including database-specific ones, the OWASP Project maintains a SQL Injection Prevention Cheat Sheet, which is a good place to learn more about the subject.

对于那些正在寻找可用技术（包括特定于数据库的技术）的完整列表的人来说，OWASP 项目维护着一个SQL 注入预防骗局表，这是一个了解该主题的好地方。

3.1. Parameterized Queries

3.1.参数化查询

This technique consists of using prepared statements with the question mark placeholder (“?”) in our queries whenever we need to insert a user-supplied value. This is very effective and, unless there’s a bug in the JDBC driver’s implementation, immune to exploits.

这种技术包括在我们需要插入一个用户提供的值时，在查询中使用带有问号占位符（”?”）的准备语句。这是非常有效的，除非JDBC驱动程序的实现有一个错误，否则不会被人利用。

Let’s rewrite our example function to use this technique:

让我们重写我们的例子函数来使用这种技术。

public List<AccountDTO> safeFindAccountsByCustomerId(String customerId)
  throws Exception {
    
    String sql = "select "
      + "customer_id, acc_number, branch_id, balance from Accounts"
      + "where customer_id = ?";
    
    Connection c = dataSource.getConnection();
    PreparedStatement p = c.prepareStatement(sql);
    p.setString(1, customerId);
    ResultSet rs = p.executeQuery(sql)); 
    // omitted - process rows and return an account list
}

Here we’ve used the prepareStatement() method available in the Connection instance to get a PreparedStatement. This interface extends the regular Statement interface with several methods that allow us to safely insert user-supplied values in a query before executing it.

这里我们使用了Connection实例中的prepareStatement()方法来获取PreparedStatement。这个接口扩展了常规的Statement接口，有几个方法允许我们在执行查询之前在查询中安全插入用户提供的值。

For JPA, we have a similar feature:

对于JPA，我们有一个类似的功能。

String jql = "from Account where customerId = :customerId";
TypedQuery<Account> q = em.createQuery(jql, Account.class)
  .setParameter("customerId", customerId);
// Execute query and return mapped results (omitted)

When running this code under Spring Boot, we can set the property logging.level.sql to DEBUG and see what query is actually built in order to execute this operation:

在Spring Boot下运行这段代码时，我们可以将属性logging.level.sql设为DEBUG，看看为了执行这个操作，实际建立了什么查询。

// Note: Output formatted to fit screen
[DEBUG][SQL] select
  account0_.id as id1_0_,
  account0_.acc_number as acc_numb2_0_,
  account0_.balance as balance3_0_,
  account0_.branch_id as branch_i4_0_,
  account0_.customer_id as customer5_0_ 
from accounts account0_ 
where account0_.customer_id=?

As expected, the ORM layer creates a prepared statement using a placeholder for the customerId parameter. This is the same we’ve done in the plain JDBC case – but with a few statements less, which is nice.

正如预期的那样，ORM层使用customerId参数的占位符创建了一个预备语句。这与我们在普通JDBC情况下所做的一样–但少了几条语句，这很好。

As a bonus, this approach usually results in a better performing query, since most databases can cache the query plan associated with a prepared statement.

作为奖励，这种方法通常会带来更好的查询性能，因为大多数数据库可以缓存与准备语句相关的查询计划。

Please note that this approach only works for placeholders used as values. For instance, we can’t use placeholders to dynamically change the name of a table:

请注意，这种方法只适用于作为值的占位符。例如，我们不能使用占位符来动态地改变一个表的名称。

// This WILL NOT WORK !!!
PreparedStatement p = c.prepareStatement("select count(*) from ?");
p.setString(1, tableName);

Here, JPA won’t help either:

在这里，JPA也无济于事。

// This WILL NOT WORK EITHER !!!
String jql = "select count(*) from :tableName";
TypedQuery q = em.createQuery(jql,Long.class)
  .setParameter("tableName", tableName);
return q.getSingleResult();

In both cases, we’ll get a runtime error.

在这两种情况下，我们会得到一个运行时错误。

The main reason behind this is the very nature of a prepared statement: database servers use them to cache the query plan required to pull the result set, which usually is the same for any possible value. This is not true for table names and other constructs available in the SQL language such as columns used in an order by clause.

这背后的主要原因是准备好的语句的本质：数据库服务器使用它们来缓存拉动结果集所需的查询计划，这通常对任何可能的值都是一样的。对于表名和SQL语言中的其他结构，如order by子句中使用的列，则不是这样的。

3.2. JPA Criteria API

3.2. JPA标准API

Since explicit JQL query building is the main source of SQL Injections, we should favor the use of the JPA’s Query API, when possible.

由于显式JQL查询的建立是SQL注入的主要来源，我们应该尽可能地使用JPA的查询API。

For a quick primer on this API, please refer to the article on Hibernate Criteria queries. Also worth reading is our article about JPA Metamodel, which shows how to generate metamodel classes that will help us to get rid of string constants used for column names – and the runtime bugs that arise when they change.

对于这个API的快速入门，请参考关于Hibernate Criteria查询的文章。同样值得一读的是我们的关于JPA元模型的文章，它展示了如何生成元模型类，这将帮助我们摆脱用于列名的字符串常量–以及当它们改变时产生的运行时错误。

Let’s rewrite our JPA query method to use the Criteria API:

让我们重写我们的JPA查询方法来使用Criteria API。

CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Account> cq = cb.createQuery(Account.class);
Root<Account> root = cq.from(Account.class);
cq.select(root).where(cb.equal(root.get(Account_.customerId), customerId));

TypedQuery<Account> q = em.createQuery(cq);
// Execute query and return mapped results (omitted)

Here, we’ve used more code lines to get the same result, but the upside is that now we don’t have to worry about JQL syntax.

在这里，我们使用了更多的代码行来获得同样的结果，但好处是现在我们不必担心JQL语法。

Another important point: despite its verbosity, the Criteria API makes creating complex query services more straightforward and safer. For a complete example that shows how to do it in practice, please take a look at the approach used by JHipster-generated applications.

另一个重要的观点是：尽管它很啰嗦，Criteria API使创建复杂的查询服务变得更加直接和安全。关于一个完整的例子，说明了如何在实践中做到这一点，请看一下JHipster生成的应用程序所使用的方法。

3.3. User Data Sanitization

3.3.用户数据消毒

Data Sanitization is a technique of applying a filter to user supplied-data so it can be safely used by other parts of our application. A filter’s implementation may vary a lot, but we can generally classify them in two types: whitelists and blacklists.

数据消毒是一种对用户提供的数据进行过滤的技术，这样它就可以被我们应用程序的其他部分安全使用。过滤器的实现可能有很大的不同，但我们一般可以将其分为两种类型：白名单和黑名单。

Blacklists, which consist of filters that try to identify an invalid pattern, are usually of little value in the context of SQL Injection prevention – but not for the detection! More on this later.

黑名单，由试图识别无效模式的过滤器组成，在预防SQL注入的背景下通常没有什么价值–但对检测来说不是这样的！黑名单，由试图识别无效模式的过滤器组成。稍后会有更多这方面的内容。

Whitelists, on the other hand, work particularly well when we can define exactly what is a valid input.

白名单，另一方面，当我们可以准确地定义什么是有效的输入时，效果特别好。

Let’s enhance our safeFindAccountsByCustomerId method so now the caller can also specify the column used to sort the result set. Since we know the set of possible columns, we can implement a whitelist using a simple set and use it to sanitize the received parameter:

让我们加强我们的safeFindAccountsByCustomerId方法，所以现在调用者也可以指定用于排序结果集的列。由于我们知道可能的列的集合，我们可以使用一个简单的集合实现一个白名单，并使用它来净化接收到的参数。

private static final Set<String> VALID_COLUMNS_FOR_ORDER_BY
  = Collections.unmodifiableSet(Stream
      .of("acc_number","branch_id","balance")
      .collect(Collectors.toCollection(HashSet::new)));

public List<AccountDTO> safeFindAccountsByCustomerId(
  String customerId,
  String orderBy) throws Exception { 
    String sql = "select "
      + "customer_id,acc_number,branch_id,balance from Accounts"
      + "where customer_id = ? ";
    if (VALID_COLUMNS_FOR_ORDER_BY.contains(orderBy)) {
        sql = sql + " order by " + orderBy;
    } else {
        throw new IllegalArgumentException("Nice try!");
    }
    Connection c = dataSource.getConnection();
    PreparedStatement p = c.prepareStatement(sql);
    p.setString(1,customerId);
    // ... result set processing omitted
}

Here, we’re combining the prepared statement approach and a whitelist used to sanitize the orderBy argument. The final result is a safe string with the final SQL statement. In this simple example, we’re using a static set, but we could also have used database metadata functions to create it.

在这里，我们将准备好的语句方法和用于净化orderBy参数的白名单相结合。最后的结果是一个带有最终SQL语句的安全字符串。在这个简单的例子中，我们使用了一个静态集合，但我们也可以使用数据库元数据函数来创建它。

We can use the same approach for JPA, also taking advantage of the Criteria API and Metadata to avoid using String constants in our code:

我们可以对JPA使用同样的方法，同样利用Criteria API和Metadata来避免在代码中使用String常量。

// Map of valid JPA columns for sorting
final Map<String,SingularAttribute<Account,?>> VALID_JPA_COLUMNS_FOR_ORDER_BY = Stream.of(
  new AbstractMap.SimpleEntry<>(Account_.ACC_NUMBER, Account_.accNumber),
  new AbstractMap.SimpleEntry<>(Account_.BRANCH_ID, Account_.branchId),
  new AbstractMap.SimpleEntry<>(Account_.BALANCE, Account_.balance))
  .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

SingularAttribute<Account,?> orderByAttribute = VALID_JPA_COLUMNS_FOR_ORDER_BY.get(orderBy);
if (orderByAttribute == null) {
    throw new IllegalArgumentException("Nice try!");
}

CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Account> cq = cb.createQuery(Account.class);
Root<Account> root = cq.from(Account.class);
cq.select(root)
  .where(cb.equal(root.get(Account_.customerId), customerId))
  .orderBy(cb.asc(root.get(orderByAttribute)));

TypedQuery<Account> q = em.createQuery(cq);
// Execute query and return mapped results (omitted)

This code has the same basic structure as in the plain JDBC. First, we use a whitelist to sanitize the column name, then we proceed to create a CriteriaQuery to fetch the records from the database.

这段代码的基本结构与普通JDBC相同。首先，我们使用白名单来净化列名，然后我们继续创建一个CriteriaQuery来从数据库中获取记录。

3.4. Are We Safe Now?

3.4.我们现在安全吗？

Let’s assume that we’ve used parameterized queries and/or whitelists everywhere. Can we now go to our manager and guarantee we’re safe?

让我们假设我们已经到处使用参数化查询和/或白名单。现在我们可以去找我们的经理，保证我们是安全的吗？

Well… not so fast. Without even considering Turing’s halting problem, there are other aspects we must consider:

嗯……没那么快。甚至不考虑图灵的停止问题，还有其他方面我们必须考虑。

Stored Procedures: These are also prone to SQL Injection issues; whenever possible please apply sanitation even to values that will be sent to the database via prepared statements
Triggers: Same issue as with procedure calls, but even more insidious because sometimes we have no idea they’re there…
Insecure Direct Object References: Even if our application is SQL-Injection free, there’s still a risk that associated with this vulnerability category – the main point here is related to different ways an attacker can trick the application, so it returns records he or she was not supposed to have access to – there’s a good cheat sheet on this topic available at OWASP’s GitHub repository

In short, our best option here is caution. Many organizations nowadays use a “red team” exactly for this. Let them do their job, which is exactly to find any remaining vulnerabilities.

简而言之，我们在这里的最佳选择是谨慎行事。现在，许多组织正是为此而使用 “红队”。让他们做他们的工作，这正是为了找到任何剩余的漏洞。

4. Damage Control Techniques

4.损害控制技术

As a good security practice, we should always implement multiple defense layers – a concept known as defense in depth. The main idea is that even if we’re unable to find all possible vulnerabilities in our code – a common scenario when dealing with legacy systems – we should at least try to limit the damage an attack would inflict.

作为一个良好的安全实践，我们应该始终实施多个防御层–这个概念被称为深度防御。其主要思想是，即使我们无法在代码中找到所有可能的漏洞–这是处理遗留系统时常见的情况–我们至少应该尝试限制攻击所造成的损害。

Of course, this would be a topic for a whole article or even a book but let’s name a few measures:

当然，这将是一整篇文章甚至一本书的主题，但让我们列举几个措施。

Apply the principle of least privilege: Restrict as much as possible the privileges of the account used to access the database
Use database-specific methods available in order to add an additional protection layer; for example, the H2 Database has a session-level option that disables all literal values on SQL Queries
Use short-lived credentials: Make the application rotate database credentials often; a good way to implement this is by using Spring Cloud Vault
Log everything: If the application stores customer data, this is a must; there are many solutions available that integrate directly to the database or work as a proxy, so in case of an attack we can at least assess the damage
Use WAFs or similar intrusion detection solutions: those are the typical blacklist examples – usually, they come with a sizeable database of known attack signatures and will trigger a programmable action upon detection. Some also include in-JVM agents that can detect intrusions by applying some instrumentation – the main advantage of this approach is that an eventual vulnerability becomes much easier to fix since we’ll have a full stack trace available.

5. Conclusion

5.总结

In this article, we’ve covered SQL Injection vulnerabilities in Java applications – a very serious threat to any organization that depends on data for their business – and how to prevent them using simple techniques.

在这篇文章中，我们介绍了Java应用程序中的SQL注入漏洞–对于任何依赖数据开展业务的组织来说，这是一个非常严重的威胁–以及如何使用简单的技术来防止它们。

As usual, full code for this article is available on Github.

像往常一样，本文的完整代码可在Github上获得。