Debunking myths: proxies impact performance

Alef Arendsen

In a recent blog entry Marc Logemann touches on the subject of proxy performance. In his entry he asks for a white paper by 'the Spring guys'. I don't want to spend (p)ages and (p)ages on discussing the differences up to the nanosecond between proxies and byte code weaving mechanisms, but I do think it's valuable to re-iterate once again what the differences are and whether or not this discussion matters at all.

What are proxies and why do we use them?

Let's first shortly revisit what proxies are used for (in general, and in Spring). According the Gang of Four (GoF) book on Design Patterns a proxy is a surrogate object or placeholder for another object to control access to it. Because the proxy sits in between the caller of an object and the real object itself, it can decide to prevent the real (or target) object from being invoked, or do something before the target object is invoked.

prox.jpg

In other words, proxies can be used as stand-ins for real objects to apply extra behavior to those objects–be it security-related behavior, caching or maybe performance measurements.

Many modern frameworks use proxies to realize functionality that would not have been possible otherwise. Many object-relational mappers use proxies to implement behavior that prevents data from being loaded until it is actually really needed (this is sometimes called lazy loading). Spring also uses proxies to realize some of its functionality such as its remoting facilities, its transaction management facilities and the AOP framework.

An alternative to proxies is byte code weaving. When using byte code weaving mechanisms, there will never be a second object (aka the proxy). Instead, if behavior (such as transaction management or security) needs to be applied, it is woven 'into' the existing code, instead of 'around it'. One way to do do the weaving process is by using the Java5 -javaagent flag. Other ways are available to.

In other words: with proxies you end up with a proxy object that sits in front of the target object, whereas with a byte code weaving approach, there will not be a proxy that has to delegate calls.

The cold hard truth

Okay, let's get it over with: proxies add overhead to a plain method invocation… significant overhead. To my mind, that's absolutely not surprising. The fact that putting a proxy in between is totally natural. Generally one could say that an intermediate always adds overhead. Now the question is: what do we get in return for the overhead a proxy adds?

Note that I'm not going to bother with providing numbers here. As Stefan Hansel correctly points out in his comment on Marc's blog, micro benchmarks measuring the difference between plain target invocation versus having a proxy in between (or any micro benchmark for that matter) don't really make sense because of a whole bunch of other factors that you have to take into account as well.

Okay, but you do want numbers?

Okay, let's get to it then. Let's consider the following piece of code where we have two objects, one which is proxied, one which is not. Let's assume the target object itself (the dotIt()) method does not do anything in particular. Let's also assume the proxy does not do any in particular also (it just delegates to the target object).

If I run this code on my laptop (MacBook) with a plain JDK dynamic proxy (more about those later), then one method invocation to myRealObject takes 9 nanoseconds (10-9). One invocation to the proxied object takes 500 nanoseconds (about 50 times as slow).

// real object
MyInterface myRealObject;
myRealObject.doIt();

// proxied object
MyInterface myProxiedObject;
myProxiedObject.doIt();

In contrast, if I a byte code weaving approach (in this case I'm using AspectJ to simulate the same setup), I only end up with about 2 nanoseconds added to my invocation.

So concluding, I can't make it any better than it is: proxies add significant overhead to a plain method invocation.

Before we go on let's first realize that the overhead that's being added here is fixed. It is definitely not the case that if the doIt() method itself would take 5 seconds, the proxied invocation would take 50 times as long. No, instead, the invocation would take 5 seconds + ~500 nanoseconds.

Putting things in context (or: should you care?)

Okay, so now we know proxies aren't some kind of super-fast objects that work their magic without causing side-effects, the question is: "do we need to worry about the overhead". The answer is pretty simple: "no you don't" ;-) . I'll explain you why.

We're using proxies to transparently add behavior to an object. Maybe to decorate an object with security rules (administrators can access it, but normal users can't) or maybe it's because we want to enable lazy loading, only loading data from a database on first access. Another reason would be to enable transparent transaction management for our objects.

Transaction management

Let's look at the transaction management example. The following sequence diagram roughly depicts (a simplified view of) what happens in a situation where a service is called whereby beforehand a transaction is started and after successful completion the transaction is committed.

seq.jpg

The invocation of the service itself now definitely involve a certain overhead (the overhead we already discussed before). The question however is, what do we get in exchange for the overhead?

Benefits realized

If we continue to look at the above example, we've realized a couple of benefits.

Code simplification
We've greatly simplified our code by putting a proxy in between. If we use the @Transactional annotation Spring provides, all we need to do is the following:

public class Service {

  @Transactional
  public void executeService() { }

}

and

<tx:annotation-driven/>

<bean class="com.mycompany.Service"/>

An alternative (programmatic) approach would involve significantly modifying either the client (caller) or the service class itself.

Centralized transaction management
Transaction management is now taken care of by a central facility, allowing for much more optimization and a very consistent approach to doing transaction management. This would not have been possible if we'd have implemented the transaction management code in our service or caller itself.

And what does it matter anyway?

If that's not enough, we can always start looking at the actual performance degradation that we get from the proxying mechanism and compare it to the actual time it takes to start and/or commit a transaction. I don't have any numbers available, but I can assure you, committing a transaction on a JDBC transaction definitely takes more time than 491 nanoseconds.

But what if it's very fine-grained operations the proxy executes

Ahh! That's an entirely different story. There are different classes of behavior you can transparently add of course (either using proxies or using a byte code weaving approach). I usually distinguish between fine-grained and coarse-grained behavior. Coarse-grained behavior to my mind is applied at a service level or only to a certain and limited set of operations in our application. A more fine-grained set of behavior would for example include logging of every method in our system. I would definitely not choose to use a proxy-based approach for such fine-grained approaches.

Rules of thumb

Concluding we can say the following:

  • first of all proxies add overhead and that this overhead is negligible if the behavior applied to the objects that are proxied has something to do with longer running operations (such as database or file access or transaction management).
  • We can also say that if you need very fine-grained behavior and want to apply that to a large set of objects, it's probably safer to go for a byte code weaving approach, such as AspectJ.
  • If that's not enough, it's probably still safe to say that proxying (unless applied to thousands or more objects in your system) would never be the first place you should look for in a system that suffers from degraded performance.
  • Another rule of thumb might possibly be that any request in your system should probably not involve (calls to) more than 10 (or so) proxied methods. 10 proxy operations * 500 ns per proxy operation = 5 microseconds (which is still negligible I would say), but 100,000 proxy operations * 500 ns per proxy operation = 50 millisecond (which to my mind is no longer negligible).

Different types of proxies

Apart from the discussion about whether or not proxies add overhead at all, it's also relevant to shortly discuss different types of proxies. There are several distinct types of proxies. In my little benchmark, I've used the JDK dynamic proxying infrastructure (from the java.lang.reflect package) that is only capable of creating proxies for interfaces. Another proxying mechanism is CGLIB which uses a slightly different approach to proxying. The last time I did a small performance benchmark between the two, I didn't really find a significant difference and frankly, I don't care that much. What is important is the inner workings of the proxy that has been created. There are a lot of things that can go wrong if you start implementing proxies yourself. If you compare the following two pieces of code for example you might not expect there to be a huge difference in performance between the two. And when I'm saying huge, I'm saying a factor 10 or so…

public Object invoke(Object proxy, Method proxyMethod, Object[] args)
throws Throwable {
        Method targetMethod = null;
        if (!cachedMethodMap.containsKey(proxyMethod)) {
                targetMethod = target.getClass().getMethod(proxyMethod.getName(),
                        proxyMethod.getParameterTypes());
                cachedMethodMap.put(proxyMethod, targetMethod);
        } else {
                targetMethod = cachedMethodMap.get(proxyMethod);
        }
        Ojbect retVal = targetMethod.invoke(target, args);
        return retVal;
}
public Object invoke(Object proxy, Method proxyMethod, Object[] args)
throws Throwable {
        Method targetMethod = target.getClass().getMethod(proxyMethod.getName(),
                        proxyMethod.getParameterTypes());
        Ojbect retVal = targetMethod.invoke(target, args);
        return retVal;
}

In other words, leave generating or creating proxies to people or frameworks that know what they're doing. Fortunately for you, I wasn't involved in the proxy design and Rob, Juergen, Rod et. al. are way better at that than I am, so no worries there ;-) .

What about byte code weaving

In general, one can say that a byte code weaving approach takes a little more time to set up depending on your environment. In some scenarios you need to set up a java agent, in other situations, you might need to modify your compilation process, other frameworks might require the use of a different class loader. In other words, byte code weaving is a little harder to set up. In my experience, (as always) the 80-20 rule applies here as well. 80% of all requirements can probably be solved using proxy-based systems. For the last mile, or the remaining 20%, opting for a byte code weaving approach might be a good option.

The relation with AOP

You might wonder why I haven't touched on the subject of AOP yet. Proxies and byte code weaving have a strong relation to AOP. Or maybe that's the other way around. In any case, Spring's AOP framework uses proxies to realize its functionality. Proxies to my mind are just an implementation detail (although a pretty important one) is strongly linked with AOP and Spring in general.

Conclusion

Concluding, we can say that proxies do add a little bit of overhead to a call to an object it proxies but that the discussion about this is not relevant under most circumstances. The reason for this lies partially in the great benefits proxies bring (such as way better maintenance of our code due to code simplification and centralized control) and also in the fact that things we do using proxies (such as transaction management or caching) usually have a far greater impact on performance as the proxying mechanism itself.

 

19 responses


  1. Could you elaborate a little bit more on the -javaagent flag for dynamic weaving?


  2. [quote post="210"]one method invocation to myRealObject takes 9 nanoseconds (10-6).[/quote]

    A nanosecond is 10-9 seconds.

    [quote post="210"]Transaction management is now taken care of by a central facility, allowing for much more optimization and a very consistent approach to doing transaction management. This would have been possible if we'd have implemented the transaction management code in our service or caller itself.[/quote]

    I think you mean it would have been impossible.


  3. I had some similar results when I did some testing way, way back. There is a really minimal (all but unmeasurable) difference between a JDK dynamic proxy and a bytecode generated proxy (I use Javassist, rather than ASM or CGLIB). There was a tiny difference in memory usage; parameters had to be wrapped (if primitives) and then provided inside an object array.

    I tend to prefer bytecode proxies out of some twisted sense of aesthetics, and because the stack trace is a bit more readable (generally, fewer extra frames of jumping in and out of JNI space).

    It *may* be possible that Hotspot will do a better job optimizing and inlining a bytecode proxy than a dynamic proxy, but the end result will be all but impossible to measure.


  4. [quote comment="35311"]Could you elaborate a little bit more on the -javaagent flag for dynamic weaving?[/quote]

    Hi -FoX-,

    A Java Agent is essentially a hook you can add as a command line argument while starting up a VM that essentially allows you to do anything. One thing you can do is to register transformers that transform class files while they are loaded. Several frameworks use this approach to instrument classes with extra behavior.

    If you're using such an approach to realize AOP-like functionality with AspectJ, we like to call it load-time weaving (LTW).

    Note that Java Agents are only available in Java5 and above. Other than that, LTW for for example AspectJ is possible using other techniques as well, such as the support we've added in Spring 2.1.


  5. Nice article Alef,

    I did a similar comparison last year, which I keep meaning to update with the latest JDK results: http://blogs.warwick.ac.uk/colinyates/entry/performance_of_spring/

    Now I don't have to :)


  6. Thanks Colin,

    yeah, I\'ve done my share of performance measurements of proxies as well in the past, but as the difference between your numbers and what Howard claims to have seen show, it doesn\'t really work out well anyway, comparing the two mechanisms.


  7. [quote comment="35315"][quote post="210"]one method invocation to myRealObject takes 9 nanoseconds (10-6).[/quote]

    A nanosecond is 10-9 seconds.

    [quote post="210"]Transaction management is now taken care of by a central facility, allowing for much more optimization and a very consistent approach to doing transaction management. This would have been possible if we'd have implemented the transaction management code in our service or caller itself.[/quote]

    I think you mean it would have been impossible.[/quote]

    Thanks for spotting those mistakes! I've fixed them both.

    Alef


  8. Nice article. The silly thing with those people complaining about extra performance overhead is that these same people do " " String concatentation in loops (no compiler magic there then) and generate non performant SQLs because of lack of knowledge with ORM and various other things that affect performance in a much broader way, most likely domain specific issues.


  9. I'm inexperienced, so I apologize if my points seem sophomoric. That said…

    Does it make sense to compare dynamic proxies to bytecode weaving? In other words, are you inadvertently talking past whether this is even a debate? I do not know, so let us discuss it and perhaps agree on an answer.

    Bytecode weaving just means that bytecode is weaved into the class file generated by the compiler. AspectJ supports compile-time, post compile-time and load-time weaving. Classes can be instrumented with extra behavior at any of these points and ajc even supports subsequent reweaving by default.

    A proxy, on the other hand, is a surrogate object used to control access to the real object.

    Are these two concepts mutually exclusive? Although you did not directly state so, you did mention an 80/20 split between the proxy design technique and bytecode weaving technique. You seem to have set forth an assumption that these concepts are mutually exclusive.

    I'm testing your assumption because it simply doesn't sound right to me, albeit I've never been forced to think in these terms before, perhaps neither have you. It sounds like you are thinking at the implementation level, in the language of Java, rather than thinking about the problem at a higher level. What might be a good high level model of this problem? In order to describe this problem, I think you need a modular system in place to control the routing of messages. Modular is important, because it means you must be able to both add and take away from the message routing system. In my humble opinion, this is exactly what aspect-oriented programming seeks to address.


  10. Thanks for this great article Alef!


  11. [quote comment=\"35973\"]I\'m inexperienced, so I apologize if my points seem sophomoric. That said…

    Does it make sense to compare dynamic proxies to bytecode weaving? In other words, are you inadvertently talking past whether this is even a debate? I do not know, so let us discuss it and perhaps agree on an answer.

    Bytecode weaving just means that bytecode is weaved into the class file generated by the compiler. AspectJ supports compile-time, post compile-time and load-time weaving. Classes can be instrumented with extra behavior at any of these points and ajc even supports subsequent reweaving by default.

    A proxy, on the other hand, is a surrogate object used to control access to the real object.

    Are these two concepts mutually exclusive? Although you did not directly state so, you did mention an 80/20 split between the proxy design technique and bytecode weaving technique. You seem to have set forth an assumption that these concepts are mutually exclusive.

    I\'m testing your assumption because it simply doesn\'t sound right to me, albeit I\'ve never been forced to think in these terms before, perhaps neither have you. It sounds like you are thinking at the implementation level, in the language of Java, rather than thinking about the problem at a higher level. What might be a good high level model of this problem? In order to describe this problem, I think you need a modular system in place to control the routing of messages. Modular is important, because it means you must be able to both add and take away from the message routing system. In my humble opinion, this is exactly what aspect-oriented programming seeks to address.[/quote]

    You\'ve nailed it! I think your last paragraph is possibly one of the better explanations of the difference between AOP and proxies (or least, the difference in the level of the two concepts).

    So yes, on a conceptual (or higher) level, it does absolutely not make sense to compare the two, only on an implementation level and this is exactly what I was trying to achieve. I should probably make that a little bit more explicit. I think I have to rewrite some of this including a clearer distinction between the levels of abstraction we\'re talking about.

    In other words: you\'re totally right :) . And no, the two are definitely not mutually exclusive.


  12. Nice article, we just had a discussion about it yesterday. We also looked at an AOP performance comparison that can be found here: http://docs.codehaus.org/display/AW/AOP Benchmark. Is this still a valid one?


  13. Dear Alef,

    Thanks a lot for this great article.
    I had this discussion recently with Marc and I think he missed the point.

    First of all I love Spring at all and want to use this opportunity to thanks all you guys for this great work. I already have realized two big pan european projects with Spring and I am also aware of the based technology for the Spring factory framework.

    The point was, that Marc just joined the team when our project was in the last days of the test phase and no big changes were expected to be done and commited buggy code which broke the build.

    I thought that I had made clear that the main issue was, that breaking the build just before delivering is not a good idea.
    Beside that one of my last points was, that springifying that particular class would not make any sense as no of the mentioned advantages would apply and then you have only to pay the overhead.
    That's it. Thanks for backing up my position.

    BTW, debuging Marcs code I was really surprised to discover that hibernate (assuming using cglib) will also create proxies for abstract classes and the AbstractMethodError is thrown later when you really call a method. Is it a bug or feature? (not a serious question :)

    But reading your article I have some particular questions helping me to see some issues more clearly:

    1. A mediate C -Compiler is capable to detect empty functions and eliminate them. In Java the at least hotspot compiler would do this by inlining the empty block, so no time should be consumed calling the empty function after a given number of calls.
    Have you considered this in your micro benchmark? (So the function should count a counter or something like that to avoid compiler/hotspot optimization and counting the call time after the hotspot optimisation is in place?)

    2. My Feeling is that obsessive usage of proxies will hinder compiler optimization as the compiler always will end up on interfaces and the link to the implementation is only available in the application-context.xml. And the usage of reflection is also not supporting the hotspot compiler.
    Do you agree or are there any discussions covering this topic?

    Great thanks in advance
    Maz
    http://www.mazcity.de


  14. [quote comment="36046"]
    Thanks a lot for this great article.
    [/quote]
    Thanks Maz. I appreciate that.

    [quote comment="36046"]
    BTW, debuging Marcs code I was really surprised to discover that hibernate (assuming using cglib) will also create proxies for abstract classes and the AbstractMethodError is thrown later when you really call a method. Is it a bug or feature? (not a serious question :)
    [/quote]
    Hmm, interesting. I never noticed this behavior. Of course it's possible and technical totally understandable (yes, by default Hibernate uses CGLIB unless you specify otherwise), but I would argue that this is not the behavior that you'd expect as a user…

    [quote comment="36046"]
    1. A mediate C -Compiler is capable to detect empty functions and eliminate them. In Java the at least hotspot compiler would do this by inlining the empty block, so no time should be consumed calling the empty function after a given number of calls.
    Have you considered this in your micro benchmark? (So the function should count a counter or something like that to avoid compiler/hotspot optimization and counting the call time after the hotspot optimisation is in place?)
    [/quote]
    When I have time I'll try to have a look at the small benchmark that I prepared. Of course my whole point was that performance doesn't really matter ;-) , but I understand that this can have a big impact. I'll probably have to update the article anyway after some other comments, so I'll try to incorporate that in a future revision.

    [quote comment="36046"]
    2. My Feeling is that obsessive usage of proxies will hinder compiler optimization as the compiler always will end up on interfaces and the link to the implementation is only available in the application-context.xml. And the usage of reflection is also not supporting the hotspot compiler.
    Do you agree or are there any discussions covering this topic?
    [/quote]
    That's an interesting suggestion. I'm not about this, I wish I was a compiler expert but unfortunately, I'm not :) . At first sight, I would agree. I've forwarded this to some other guys, let's see what they come up with.


  15. How do you intend on designing this benchmark? Now _that_ would be something to blog about. You might get good feedback on how to design or improve the quality of the benchmark, too.


  16. [quote comment="36230"]How do you intend on designing this benchmark? Now _that_ would be something to blog about. You might get good feedback on how to design or improve the quality of the benchmark, too.[/quote]
    John,

    I hope you realize that my goal of this blog entry was to actually point out that a I'm not going to do a benchmark :) as it is irrelevant. Of course, from a theoretic standpoint it would definitely be a nice exercise, but unfortunately I (and I think neither does any of my colleagues) don't have time for this…

    regards,
    Alef


  17. Sorry. I was confused by this statement: "When I have time I'll try to have a look at the small benchmark that I prepared."


  18. Very interesting and useful article. May I ask how you performed this benchmark? Did you just loop over a method invocation a very large number of times, say several billion? I have been doing some benchmarks on operations that take roughly 1 millisecond to execute and I find that using calls to System.currentTimeMillis() seem to only have a precision of 15-20ms. I've compensated by running a very large number of tests, measuring the total and dividing to get an everage. But if there is some other technique for getting more precise time measurements I would love to know about it! Thanks.

    -Cliff


  19. Hi Cliff,

    this is exactly what I did. It was a very quick benchmark and this is the usual technique I use as well. I do usually execute these tests on a *nux-based machine, as I find the accuracy of system.currentTimeMillis() to be a little better there. As long as the you make the number of executions very big, it doesn't really make a difference anymore however…

    One thing you have to keep in mind: micro benchmarks are not really useful if you *really* want to know about the performance of a specific operation. A micro benchmark ignores a lot of stuff such as concurrency, the difference in platforms you run the benchmark on, et cetera… So this is really only indicative.

3 trackbacks

Leave a Reply