Designing a User Friendly Java Library – 设计一个用户友好的Java库

最后修改: 2016年 12月 16日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Java is one of the pillars of the open-source world. Almost every Java project uses other open-source projects since no one wants to reinvent the wheel. However, many times it happens that we need a library for its functionality but we have no clue how to use it. We run into things like:

Java是开源世界的支柱之一。几乎每个Java项目都使用其他开源项目,因为没有人愿意重新发明轮子。然而,很多时候,我们需要一个库的功能,但我们却不知道如何使用它。我们会遇到这样的事情。

  • What is it with all these “*Service” classes?
  • How do I instantiate this, it takes too many dependencies. What is a “latch“?
  • Oh, I put it together, but now it starts throwing IllegalStateException. What am I doing wrong?

The trouble is that not all library designers think about their users. Most think only about functionality, and features, but few consider how the API is going to be used in practice, and how the users’s code will look and be tested.

问题是,并不是所有的库设计者都为他们的用户着想。大多数人只考虑功能和特性,但很少有人考虑API在实践中会被如何使用,以及用户的代码会被如何看待和测试。

This article comes with a few pieces of advice on how to save our users some of these struggles – and no, it’s not through writing documentation. Of course, an entire book could be written on this subject (and a few have been); these are some of the key points I learned while working on several libraries myself.

这篇文章提供了一些关于如何为我们的用户省去一些麻烦的建议–不,这不是通过写文档。当然,关于这个问题可以写一整本书(而且已经有几本了);这些是我自己在做几个库的时候学到的一些关键点。

I will exemplify the ideas here using two libraries: charles and jcabi-github

我将在这里使用两个库来例证这些想法。charlesjcabi-github

2. Boundaries

2.边界

This should be obvious but many times it isn’t. Before starting to write any line of code, we need to have a clear answer to some questions: what inputs are needed? what is the first class my user will see? do we need any implementations from the user? what is the output? Once these questions are clearly answered everything becomes easier since the library already has a lining, a shape.

这应该是很明显的,但很多时候并不明显。在开始写任何一行代码之前,我们需要对一些问题有一个明确的答案:需要什么输入? 我的用户将看到的第一个类是什么?我们是否需要用户的任何实现?一旦这些问题有了明确的答案,一切都会变得简单,因为库中已经有了一个衬里,一个形状。

2.1. Input

2.1.输入

This is maybe the most important topic. We have to make sure it’s clear what the user needs to provide to the library in order for it to do its work. In some cases this a very trivial matter: it could be just a String representing the auth token for an API, but it also might be an implementation of an interface, or an abstract class.

这也许是最重要的话题。我们必须确保清楚用户需要向库提供什么,以便它能完成工作。在某些情况下,这是一个非常微不足道的问题:它可能只是一个代表API的授权令牌的字符串,但它也可能是一个接口的实现,或一个抽象类。

A very good practice is to take all the dependencies through constructors and to keep these short, with a few parameters. If we need to have a constructor with more than three or four parameters, then the code should clearly be refactored. And if methods are used to inject mandatory dependencies then the users will most likely end up with the third frustration described in the overview.

一个非常好的做法是通过构造函数来获取所有的依赖关系,并且保持这些构造函数简短,只有几个参数。如果我们需要有一个超过三或四个参数的构造函数,那么代码显然应该被重构。如果方法被用来注入强制性的依赖关系,那么用户很可能最终会遇到概述中描述的第三种挫折。

Also, we should always offer more than one constructor, give users alternatives. Let them work both with String and Integer or don’t restrict them to a FileInputStream, work with an InputStream, so they can submit maybe ByteArrayInputStream when unit testing etc.

另外,我们应该始终提供一个以上的构造函数,给用户以选择。让他们同时使用StringInteger,或者不限制他们使用FileInputStream,而是使用InputStream,这样他们就可以在单元测试时提交ByteArrayInputStream

For example, here are a few ways we can instantiate a Github API entry point using jcabi-github:

例如,以下是我们使用jcabi-github实例化一个Github API入口点的几种方法。

Github noauth = new RtGithub();
Github basicauth = new RtGithub("username", "password");
Github oauth = new RtGithub("token");

Simple, no hustle, no shady configuration objects to initialize. And it makes sense to have these three constructors, because you can use the Github website while logged out, logged in or an app can authenticate on your behalf. Naturally, some functionality won’t work if you are not authenticated, but you know this from the start.

简单,没有喧嚣,没有阴暗的配置对象需要初始化。有这三个构造函数是有意义的,因为你可以在登录后使用Github网站,也可以登录后使用,或者一个应用程序可以代表你进行认证。当然,如果你没有经过认证,有些功能就无法使用,但你从一开始就知道这一点。

As a second example, here is how we would work with charles, a web crawling library:

作为第二个例子,下面是我们如何使用charles,一个网络爬行库。

WebDriver driver = new FirefoxDriver();
Repository repo = new InMemoryRepository();
String indexPage = "http://www.amihaiemil.com/index.html";
WebCrawl graph = new GraphCrawl(
  indexPage, driver, new IgnoredPatterns(), repo
);
graph.crawl();

It’s also quite self-explanatory, I believe. However, while writing this, I realize in the current version there is a mistake: all the constructors require the user to supply an instance of IgnoredPatterns. By default, no patterns should be ignored, but the user should not have to specify this. I decided to leave it like this here, so you see a counter example. I assume that you would try to instantiate a WebCrawl and wonder “What is it with that IgnoredPatterns?!”

我相信这也是不言自明的。然而,在写这篇文章时,我意识到在当前版本中存在一个错误:所有的构造函数都要求用户提供一个IgnoredPatterns的实例。默认情况下,不应该有任何模式被忽略,但用户不应该指定这一点。我决定把它留在这里,让你看到一个反例。我假设你会尝试实例化一个WebCrawl,并想知道 “那个IgnoredPatterns是什么东西!”

Variable indexPage is the URL from where the crawl should start, driver is the browser to use (cannot default to anything since we do not know which browser is installed on the running machine). The repo variable will be explained below in the next section.

变量indexPage是抓取开始的URL,driver是要使用的浏览器(不能默认为任何东西,因为我们不知道运行的机器上安装的是哪个浏览器)。repo变量将在下面的章节中解释。

So, as you see in the examples, try to keep it simple, intuitive and self-explanatory. Encapsulate logic and dependencies in such a way that the user doesn’t scratch his head when looking at your constructors.

因此,正如你在例子中所看到的,尽量保持简单、直观和不言而喻。以这样的方式封装逻辑和依赖关系,使用户在看你的构造函数时不会挠头。

If you still have doubts, try to make HTTP requests to AWS using aws-sdk-java: you will have to deal with a so-called AmazonHttpClient, which uses a ClientConfiguration somewhere, then needs to take an ExecutionContext somewhere in between. Finally, you might get to execute your request and get a response but still have no clue what an ExecutionContext is, for instance.

如果您仍有疑问,请尝试使用aws-sdk-java向AWS发出HTTP请求:您将不得不与所谓的AmazonHttpClient打交道,它在某处使用ClientConfiguration,然后需要在中间某处接受一个ExecutionContext。最后,您可能会得到执行您的请求并得到一个响应,但仍然不知道什么是ExecutionContext,例如。

2.2. Output

2.2.输出

This is mostly for libraries that communicate with the outer world. Here we should answer the question “how will the output be handled?”. Again, a rather funny question, but it’s easy to step wrong.

这主要是针对与外部世界进行通信的库。在这里,我们应该回答 “如何处理输出?”的问题。同样,这是一个相当有趣的问题,但很容易踏错。

Look again at the code above. Why do we have to provide a Repository implementation? Why doesn’t the method WebCrawl.crawl() just return a list of WebPage elements? It’s clearly not the library’s job to handle the crawled pages. How should it even know what we would like to do with them? Something like this:

再看看上面的代码。为什么我们要提供一个Repository的实现?为什么WebCrawl.crawl()方法不直接返回一个WebPage元素的列表?显然,处理抓取的网页不是库的工作。它怎么会知道我们想对它们做什么呢?就像这样。

WebCrawl graph = new GraphCrawl(...);
List<WebPage> pages = graph.crawl();

Nothing could be worse. An OutOfMemory exception could happen out of nowhere if the crawled site happens to have, let’s say, 1000 pages – the library loads them all in memory. There are two solutions to this:

没有什么比这更糟糕的了。如果被抓取的网站恰好有1000个页面,那么OutOfMemory异常就会突然发生–库将它们全部加载到内存中。对此有两种解决办法。

  • Keep returning the pages, but implement some paging mechanism in which the user would have to supply the start and end numbers. Or
  • Ask the user to implement an interface with a method called export(List<WebPage>), that the algorithm would call every time a max number of pages would be reached

The second option is by far the best; it keeps things simpler on both sides and is more testable. Think how much logic would have to be implemented on the user’s side if we went with the first. Like this, a Repository for pages is specified (to send them in a DB or write them on disk maybe) and nothing else has to be done after calling method crawl().

到目前为止,第二种方案是最好的;它使双方的事情都更简单,而且更容易测试。想想看,如果我们采用第一种方案,有多少逻辑需要在用户那边实现。就像这样,为页面指定一个存储库(也许是为了将它们发送到数据库或写入磁盘),在调用crawl()方法后,不需要做其他事情。

By the way, the code from the Input section above is everything that we have to write in order to get the contents of the website fetched (still in memory, as the repo implementation says, but it is our choice – we provided that implementation so we take the risk).

顺便说一下,上面输入部分的代码是我们必须写的所有内容,以便获得网站的内容(仍然在内存中,正如repo的实现所说,但这是我们的选择–我们提供了该实现,所以我们承担风险)。

To summarize this section: we should never completely separate our job from the client’s job. We should always think what happens with the output we create. Much like a truck driver should help with unpacking the goods rather than simply throwing them out upon arrival at the destination.

总结这一节:我们不应该把我们的工作和客户的工作完全分开。我们应该始终思考我们创造的产出会怎样。就像卡车司机应该帮助打开货物的包装,而不是在到达目的地后简单地把它们扔掉。

3. Interfaces

3.接口

Always use interfaces. The user should interact with our code only through strict contracts.

始终使用接口。用户应该只通过严格的契约与我们的代码互动。

For example, in the jcabi-github library the class RtGithub si the only one the user actually sees:

例如,在jcabi-github库中,用户实际看到的只有RtGithub这个类。

Repo repo = new RtGithub("oauth_token").repos().get(
  new Coordinates.Simple("eugenp/tutorials"));
Issue issue = repo.issues()
  .create("Example issue", "Created with jcabi-github");

The above snippet creates a ticket in the eugenp/tutorials repo. Instances of Repo and Issue are used, but the actual types are never revealed. We cannot do something like this:

上面的片段在eugenp/tutorials repo中创建了一个票据。使用了Repo和Issue的实例,但实际的类型从未显示。我们不能做这样的事情。

Repo repo = new RtRepo(...)

The above is not possible for a logical reason: we cannot directly create an issue in a Github repo, can we? First, we have to login, then search the repo and only then we can create an issue. Of course, the scenario above could be allowed, but then the user’s code would become polluted with a lot of boilerplate code: that RtRepo would probably have to take some kind of authorization object through its constructor, authorize the client and get to the right repo etc.

上述情况是不可能的,因为有一个逻辑原因:我们不能直接在Github repo中创建一个问题,不是吗?首先,我们必须登录,然后搜索 repo,只有这样我们才能创建一个问题。当然,上述情况是可以允许的,但这样一来,用户的代码就会被大量的模板代码所污染:那个RtRepo可能要通过其构造函数接受某种授权对象,授权客户端并进入正确的 repo 等等。

Interfaces also provide ease of extensibility and backward-compatibility. On one hand, we as developers are bound to respect the already released contracts and on the other, the user can extend the interfaces we offer – he might decorate them or write alternative implementations.

接口还提供了扩展性和向后兼容性的便利。一方面,我们作为开发者必须尊重已经发布的契约,另一方面,用户可以扩展我们提供的接口–他可能会对其进行装饰或编写替代的实现。

In other words, abstract and encapsulate as much as possible. By using interfaces we can do this in an elegant and non-restrictive manner – we enforce architectural rules while giving the programmer freedom to enhance or change the behaviour we expose.

换句话说,就是尽可能多地进行抽象和封装。通过使用接口,我们可以以一种优雅的、非限制性的方式做到这一点–我们在执行架构规则的同时,也给了程序员自由来增强或改变我们所暴露的行为。

To end this section, just keep in mind: our library, our rules. We should know exactly how the client’s code is going to look like and how he’s going to unit test it. If we do not know that, no one will and our library will simply contribute in creating code that is hard to understand and maintain.

在结束本节时,只要记住:我们的库,我们的规则。我们应该清楚地知道客户的代码会是什么样子,以及他将如何对其进行单元测试。如果我们不知道,没有人会知道,我们的库将只是在创造难以理解和维护的代码方面做出贡献。

4. Third Parties

4.第三方

Keep in mind that a good library is a light-weight library. Your code might solve an issue and be functional, but if the jar adds 10 MB to my build, then it’s clear that you lost the blueprints of your project a long time ago. If you need a lot of dependencies you are probably trying to cover too much functionality and should break the project into multiple smaller projects.

请记住,一个好的库是一个轻量级的库。你的代码可能解决了一个问题,并且是功能性的,但是如果这个jar在我的构建中增加了10MB,那么很明显,你在很久以前就失去了项目的蓝图。如果你需要大量的依赖性,你可能是想覆盖太多的功能,应该把项目分成多个小项目。

Be as transparent as possible, whenever possible do not bind to actual implementations. The best example that comes to mind is: use SLF4J, which is only an API for logging – do not use log4j directly, maybe the user would like to use other loggers.

尽可能的透明,只要有可能就不要与实际的实现绑定。我想到的最好的例子是:使用SLF4J,它只是一个记录的API – 不要直接使用log4j,也许用户想使用其他的记录器。

Document libraries that come through your project transitively and make sure you don’t include dangerous dependencies such as xalan or xml-apis (why they are dangerous is not for this article to elaborate).

记录那些通过你的项目过境的库,并确保你不包括危险的依赖,如xalanxml-apis(为什么它们是危险的,本文不做详细说明)。

Bottom line here is: keep your build light, transparent and always know what you are working with. It could save your users more hustle than you could imagine.

这里的底线是:保持你的构建轻盈、透明,并始终知道你在用什么工作。这可以为你的用户节省更多的麻烦,这是你无法想象的。

5. Conclusion

5.总结

The article outlines a few simple ideas that can help a project stay on the line with regards to usability. A library, being a component that should find its place in a bigger context, should be powerful in functionality yet offer a smooth and well-crafted interface.

这篇文章概述了一些简单的想法,可以帮助一个项目在可用性方面保持一致。一个库,作为一个应该在更大范围内找到自己位置的组件,应该具有强大的功能,同时提供一个平滑和精心设计的界面。

It is an easy step over the line and makes a mess out of the design. The contributors will always know how to use it, but someone new who first lays eyes on it might not. Productivity is the most important of all and following this principle, the users should be able to start using a library in a matter of minutes.

这是很容易越界的,而且使设计变得一团糟。贡献者总是知道如何使用它,但第一次看到它的新人可能不知道。生产力是最重要的,遵循这个原则,用户应该能够在几分钟内开始使用一个图书馆。