Systems get more distributed these days. One way to provision communication between all components or services of a system is with a service bus. A service bus utilizes messages and queues to get information from component/service A to component/service B. But what do we need a service bus for? And what do we need the underlying queues for?
- Communication to integrate services? No, communication between services can be achieved more easily with less overhead and much faster by using web services (WCF, SOAP, HTTP, …).
- Service discovery? No, service discovery would be necessary if we wouldn’t know the services of the system. And if we don’t know them, there are better and more specialized solutions available, like Jini or UPnP.
- Reliability and failure recovery? Yes, that is the main advantage a service bus gives us. Every single message must not get lost, even if services fail, are under high pressure or are down. That is what we need queues for.
Using reliable messaging has many advantages. But it also comes with a few pitfalls. In this post I am going to address two problematic uses of a service bus.
Queries or request/response style of communication is highly ephemeral and synchronous. All queries are initially triggered by a user and the queried service could query other services. A user will wait for a response but rarely more than about 7 seconds. He or she is likely to cancel the request if it takes longer . If the user gets an error, he/she will simply request again. So first of all, there is no need for a reliable query and its overhead. A query does not need to be queued until the serving endpoint gets the chance to serve it. Instead, queries have to be fast, regardless of how stressed the rest of the system is. Messaging is asynchronous. If part of the system or the service bus itself is under pressure, query-messages would reliably queue up, even if the user canceled the request. If the user requests again, he or she will introduce new messages into the service bus and therefore increase the pressure for services and the bus. So second of all, queries are getting slower in a stressed system and reciprocally slow down the system even more. In the worst case, the queried service fails or cannot respond anymore at all. Then not only will the user not get a response, but all querying services stop working as well. The queried service becomes a single point of failure for all querying services and reliability is in jeopardy. This is especially dramatic if the queried service is something central, like user management.
A better way to do this is to query a cache directly and synchronously, without messaging [2, 3, 4]. Ideally all services store and keep the data they are working with for themselves. Then they would not need to query other services for data. They would notify other services about changes in their data with events over the service bus and those services would handle these events and refresh their data [5, 6]. A cache or backend for frontend would provide a read model for synchronous queries and update itself via received events coming in as messages. In a stressed system those events will reliably queue up but will get handled eventually. That grantees fast queries and high availability but the queried data might not be up to date to the second. But requested and displayed data is never truly current since it could have been changed a second later, while the user still looks at the requested data.
Use the service bus for events (1 publisher -> N subscribers) and commands (N publishers -> 1 subscriber) but not for queries or request/response scenarios .
There are messaging protocols for querying, like HTTP, but they are not optimized for reliability and they don’t use queues. HTTP is just taking advantage of TCP/IPs routing fault tolerance.
Sending much data
There is something worse than querying over a service bus: querying huge amounts of data over the service bus. But that could also be attempted with event or command messages as well. Many messaging transports have size limitations for messages so the services would have to exchange lots of them, which leads to coordination overhead and transaction handling spanning multiple messages.
Big amounts of data should be queried synchronously from a cache . If the big amounts of data have to be transported with events or commands, like for image uploads, the services should put the data in a common location and just send a reference . Much data flowing between services indicates unfavorable service boundaries .
Since services should be autonomous, loosely coupled and highly cohesive, the only data that is transported between services should be small events and commands, ideally containing only IDs and references.
There are buses for sending huge amounts of data, like USB, but they are not optimized for reliability and they don’t use queues. And even USB uses messages only for short and simple commands and events. The actual data is streamed over a separate pipe in the bus.
Querying and sending much data over a service bus can have drastic consequences when the services or the service bus itself is under pressure. Sure, we could expensively scale for that as well but in my opinion it is just not necessary. These problems can be solved much easier if a service bus is not used like a hammer to treat every kind of communication like a nail. Use messaging when the communication is inherently asynchronous and consider sending only IDs and references. Don’t try to use messaging for synchronous ephemeral communication.
Using a service bus for every kind of communication just because it is already there, is like writing all the code in the Main() method just because it is already there.
 5 Reasons Visitors Leave Your Website http://www.websitemagazine.com/content/blogs/posts/archive/2014/03/21/5-reasons-visitors-leave-your-website.aspx
 Chris Patterson (the man behind MassTransit): “it is best to avoid request/response use in distributed applications” http://docs.masstransit-project.com/en/latest/usage/request_response.html
 Udi Dahan (the man behind NServiceBus): Clarified CQRS: query caches synchronously, no messaging http://www.udidahan.com/2009/12/09/clarified-cqrs/
 Udi Dahan (the man behind NServiceBus): querying “should be avoided” & “messaging is not for that” https://twitter.com/halllo/status/687701659653390336
 Udi Dahan (the man behind NServiceBus): Finding Service Boundaries
 Chris Patterson (the man behind MassTransit): Messages for updates and asynchronous background processing https://www.dotnetrocks.com/default.aspx?showNum=798
 Udi Dahan (the man behind NServiceBus): Getting lots of data over the bus? “you usually shouldn’t – instead, query a cache.” https://twitter.com/UdiDahan/status/707960622726512641
 Particular (the company behind NServcieBus): “Messages are intended to be small.” http://docs.particular.net/nservicebus/messaging/databus