TIBCOmmunity navigation
Sep 25 2009

Event-driven / event-handling rule engine performance…
Posted by Paul Vincent

An interesting fact was divulged at a recent meeting of TIBCO BusinessEvents specialists: a large financial institution had evaluated TIBCO BusinessEvents against one of the leading “business rule engine” vendors. The Proof Of Concept tested the feasibility of  running a set of event processing business rules against an incoming event stream. TIBCO BusinessEvent’s performance was reported at 4 times the non-CEP BRE’s performance, when measured by events processed in a fixed time.

Customer feedback, on the important features of TIBCO BusinessEvents relevant to their application were: rule engine, distributed cache, decision management, performance, and ability to interface to external databases.

VN:F [1.4.2_694]
Rating: 2.5/5 (2 votes cast)
  • Share/Save/Bookmark

16 Comments

  • By Peter Lin, September 25, 2009 @ 07:36

    There’s really only 2 leading Business Rule engines on the market: JRules and FICO Blaze. For the sake of readers, I think it would be good to have a brief abstract description of the type of process the application was handling.

    I can easily come up with scenarios that will make a production rule engine crawl and fall over, and I could also come up with scenarios that won’t work in a CEP engine, so the context is critical.

    The early results of my Stream RETE engine suggests a good Stream RETE engine can handle 1-2 million facts/second for simple rules without any joins. For rules that calculate sum and average, the rate drops down to 500,000-300,000 facts/second with constant memory utilization.

    I believe in an old post on this blog, you mentioned 5K events/second being high throughput. Based on the early results of my stream rete engine, anything under 20K seems rather low.

    peter

    VA:F [1.4.2_694]
    Rating: 5.0/5 (1 vote cast)
  • By Paul Vincent, September 25, 2009 @ 08:13

    Hi Peter: actually event flow depends on work done (as well as algorithm and tool): any fool can make an event system run at high throughputs if there are few rules being touched…

    In this case I’m not at liberty to publish the rules being used, so its simply an interesting point of reference that, in this example that was originally specified as a test for a BRE, TIBCO BE came out at 4x performance of one of the leading BREs. You might speculate that the rules were somehow “rigged” for BE, but I can’t see how, other than they were event processing rules rather than data processing. (Indeed, the conventional BRE used pseudo-batch / boxcar-ing techniques to improve its throughput).

    Might be that this customer will report on their work at some future user group or BRForum event. We’ll see.

    Cheers

    VN:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 25, 2009 @ 08:42

    I didn’t mean to suggest it was rigged. If I gave that impression, I apologize. The point I was trying to make is that using production rule engines takes a lot of skill and most developers don’t have that skill. Having met and interviewed a lot of “rule consultants”, most haven’t got a clue about how to write performant rules. How one implements rules in one product versus another can have a huge impact.

    I’ve seen countless examples of users writing rules without understanding how things works and then claiming the rule engine sucks or doesn’t scale. In most of the cases where people thought “a production rule engine” is slow and doesn’t scale, it was user error. Atleast that’s my experience over the years.

    After 8+ yrs of using rule engines and working with rules, one thing is clear to me. 90% of the users out there don’t understand declarative rules, inferencing, rule chaining and good practices. Even then, of the 10% of users of business rule engines, only a fraction of them understand pattern matching theory/practice and RETE algorithm.

    It’s very easy for a naive user to write bad rules and make a conclusion based on lack of knowledge and skill.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Paul Vincent, September 25, 2009 @ 10:00

    Hi Peter - no worries. The point is that BE shares a common rete semantics with these BREs - ergo there is more to event-based rule processing than simply an “event adapter”.

    It is possible, if unlikely, that the mainstream BRE example was developed incorrectly. I base that on (a) knowledge of how professional most of these guys are and (b) that this is a very high profile customer which would normally warrant a large team from this vendor.

    We’ll see if more details come out later. It could be that there was an app server used with the conventional BRE and was the bottleneck, for example…

    Cheers

    VN:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 25, 2009 @ 13:00

    I don’t doubt the developer are professional and did their best. I find that expertise and deep understanding of inference rules and pattern matching is rare. Some one could be a rock star OO developer, but end up doing things that are terrible. Expertise with RETE engines takes years of dedicated study. Just because someone has 5 years of JRules or Blaze experience, it doesn’t mean they have a solid understanding of pattern matching.

    On the contrary, I find that 80% of the developers with 5-8 yrs of blaze and jrules experience do not have the slightest clue about pattern matching and how to write rules effective for high performance. Many of these individuals are bright senior developers. Understanding production rules, inferencing and chaining take time and dedication. Most people simply never make that leap and end up coding rules as if they’re writing procedural code.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Paul Vincent, September 25, 2009 @ 15:08

    Hi Peter - interesting thoughts. In some respects you are right: there is a difference between knowing how a (Rete-type) rule engine works, and knowing how to optimize rules for performance…

    I suspect this is an artifact of targeting business analysts to write rules (rather than rule developers). Profiling and testing come into play here too…

    Cheers

    VN:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 25, 2009 @ 21:27

    Hi paul,

    You’ve been around rules for much longer than me, so I’m sure you’ve seen this hundreds of times over the years. On CLIPS and JESS mailing list, this type of thing happens all the time. Earlier this year, someone was trying to use JESS for event stream processing and was running into problems.

    The individual asked for help. After he discovered the watch command, he realized the mistake. The details of this one case is in jess mail archive (http://www.mail-archive.com/jess-users@sandia.gov/msg10787.html).

    Even though the user found a workable solution with some help, he still doesn’t understand how to write performant rules. The types of cases I’ve seen weren’t due to business analysts writing rules. Typically, I see developers making these mistakes. I’ve seen very bright senior engineers struggle with RETE and never really get it.

    It’s probably taboo to say this, but learning how to use a RETE rule engine requires several years of dedicated study. An engineer with 4-5 months to make a prototype simply doesn’t have the time to learn how to write great rules. More likely than not, they make all the same mistakes and then mistakenly think product X is bad for my project.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Paul Vincent, September 28, 2009 @ 10:38

    Hi Peter:

    In the JESS case you mention the thread ends with the comment
    “don’t set the update frequency of you timestamp fact to high!!!”
    - which presumably is something particular to JESS or the user’s application…

    In BE the events are presumed to be pushed, not polled, so if you wanted to poll you would have to set up a rule timer to and associated action. (Or a scheduler).

    But I don’t think a polling frequency setting is anything to do with Rete per se. And the basics of Rete (which are probably my level of understanding) - for example as documented in OMG PRR - are usually sufficient for most cases except in developing clever Waltz and Manners type programs. For the latter, then yes, a couple of years’ worth of experience may be useful!!!

    Cheers

    VN:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 28, 2009 @ 13:02

    Actually, the comment from the user “don’t set the update frequency of your timestamp fact too high!” show the user doesn’t understand how to use a RETE engine. He shouldn’t be using a timestamp fact.

    I pointed out this one example to illustrate the difficulty of understanding how to write good rules that scale well. Instead of using a fact for timestamp, the user should have used a function and avoided modifying the timestamp fact.

    The average rule user doesn’t have a clue how a production rule engine works and doesn’t realize they’re doing something really bad. On the surface, it looks ok. Unless someone has a deep understanding of how RETE engines work, they’ll keep doing things that bring the engine to a crawl. The sad thing is, the example I cited isn’t the first time that type of question has been asked by a developer.

    I’ve seen that type of question asked countless times and most of the users never take time to understand how to write performant rules. Instead, they make false assumptions and write procedural rules thinking it will all work. when it doesn’t they blame the engine or RETE and don’t realize they don’t understand RETE.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Opher Etzion, September 29, 2009 @ 09:00

    The fact that a user has to understand how RETE works in order to get the performance right is actually a limitation for the usability, the solution may be to have a higher level intention oriented language that is translated into rules that are processed by a RETE engine instead of writing the low level rules.

    cheers,

    Opher

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 29, 2009 @ 09:11

    I agree 10000% understanding RETE creates a major barrier for usability. I’ve been warning users of this issue for many years now. The problem is, very few people listen and the sales guys keep saying “it’s easy, just write it.” From an engine implementation perspective, I prefer to use DSL for high level abstraction and let experts translate that to low level rules. I’ve been thinking about the feasibility of a general purpose event language. The biggest challenge I see is ease of use vs accuracy.

    A “user friendly” event language has to find the delicate balance between ease of use and accuracy. Ultimately, the user has to understand temporal logic. There’s no way to get around that. The problem is, most people have zero understanding of temporal logic and don’t realize they’re writing garbage.

    In the past, I encoded the temporal logic in the DSL, so the user doesn’t have to know it. The limitation is the DSL isn’t general purpose and can become brittle/obsolete over time.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Paul Vincent, September 29, 2009 @ 23:09

    Peter, Opher: hopefully this will clarify some of these comments:
    - TIBCO BE is Rete based but has the largest market share for CEP tools. Ergo Rete itself cannot be a “major barrier” versus other event pattern matching technologies…
    - However, the lack of good inference modeling constructs is certainly restricting exploiting Rete rule language use for knowledge-rich applications (way beyond simple ECA and CQL type event processing of course)
    - Understanding *any* language / algorithm is key to performance.
    – At EPTS5 a non-Rete CEP vendor mentioned how “we hope to be fast enough without tuning, so the customer gets good enough performance”. Exactly the same with Rete type tools too.
    - Accuracy? all languages should have well-defined semantics… the problem is performance vs ease of use. Accuracy should be a given.
    - DSLs for CEP: something for a future discussion… :)

    Note that in TIBCO BusinessEvents we:
    - have both declarative rule and continuous query languages that exploit the Rete pattern matching algorithm
    - provide a UML state model as a higher level abstraction for rule programming [but this is in no way a generic inference model]

    For a further discussion on anti-Rete hype see http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&gid=1549797&discussionID=3561358

    Cheers

    VN:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 30, 2009 @ 04:58

    From first hand experience building rule editors, rule optimizers and rule compilers, modeling a problem with UML state diagrams isn’t sufficient by itself. I can take a state diagram and implement those rules several different ways with different performance characteristics. To make the executable rules optimal, one would need additional metadata beyond what UML provides.

    I love declarative rules, but the same problem exists. Depending on how the user translates the business requirements into declarative rules affects the efficiency and performance. In the past, this has been the job of a knowledge engineer who has knowledge of both the business and technical domain.

    My feeling is the tools have a long way to go towards making writing maintainable high performance rules easy. I’ve been working on these problems for over 8 years now. For example, I have a RETE topology cost function, which makes it easier to measure the average cost of a ruleset or rule. You can read the paper here.

    http://jamocha.svn.sourceforge.net/viewvc/jamocha/morendo/doc/dynamic_typing.pdf

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 30, 2009 @ 05:03

    I couldn’t see the anti-RETE thread on CEP website. For the record, I’m not anti-RETE at all. Quite the opposite, I’m pro RETE and have spent the last 8 years dedicated to the study and advancement of RETE. At the same time, I’m brutally honest about the steep learning curve. Having good tools definitely helps, but there’s no substitute for experience and deep understanding of pattern matching algorithms, theory and practice.

    peter

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Paul Vincent, September 30, 2009 @ 13:22

    Hi Peter: correct: state model is not an inference model, and yes we do need an interesting way of modeling inference.

    [By the way this was a goal of OMG PRR, but we failed to come up with any decent visual models so didn't fulfil that requirement. No vendor has AFAIK either...]

    Also you will note that in TIBCO BE we don’t claim that / target the BE rule language as being suitable for “business managers” or even “business analysts”. It’s a developer language (analysts are targeted by the BRMS, which tellingly does not map to the Rete production rules…).

    In any case, I’m sure we can agree there is more work to be done on exploiting Rete rules. It will be interesting to see what Dr Forgy has to say on CEP and rules at ORF in a few weeks…

    Cheers

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)
  • By Peter Lin, September 30, 2009 @ 15:15

    Agree 100% more work is needed to reach that point. I’m hoping another decade or two of dedication by many people will make significant progress.

    VA:F [1.4.2_694]
    Rating: 0.0/5 (0 votes cast)

Other Links to this Post

RSS feed for comments on this post. TrackBack URI

Leave a comment