TIBCOmmunity navigation

Category: ETL

Apr 10 2009

BAM for the EDW “process”

A Solution Architect posed an interesting question over on the Complex Event Processing LinkedIn discussion board run by Tim Bass, which I reproduce here as it references TIBCO CEP:

In my case I have to think as to monitoring/governing a EDW (enterprise data warehouse) where we have a classic approach (host systems (in different legal entities of the Group) -> staging area -> edw -> dm) that can be thought as a state machine but we have also a lot of Human tasks thinking about data quality processes.
So, my reasoning have to be .. If I have human tasks then I have to build a BPM and using a BAM/CEP in order to control our environment?
In any case could BAM+CEP (re -thinking about Progress or Tibco solutions) without BPM solve my problem?
(just a little detail, my BPM is Websphere Process Server) …

Of course there are multiple terms and technologies in play here:

  1. data cleansing and validation - typically rule based (technology: BRE; product: TIBCO BusinessEvents), and often-times requires manual intervention (technology: BPM workflow; product: TIBCO iProcess); note that this could also be considered ETL
  2. data lifecycles - this could be considered a “state” issue (technology: state machine; product: TIBCO BusinessEvents), but is probably orthogonal to “data management” (technology: MDM; product: TIBCO CIM)
  3. operational awareness - BAM over IT systems, monitoring their state usually via their state transitions (events) (technology: CEP; product: TIBCO BusinessEvents).

Interestingly, one of TIBCO’s technology tenets is “no rip n’ replace”, meaning customers (like this one) use TIBCO CEP with non-TIBCO BPM. And the need to use a BPM tool will depend on the need to manage the manual interventions (i.e. workflow) and the complexity of the tasks / process models - again a popular TIBCO technology mix is using “TIBCO BPM+” a.k.a. CEP driving BPM.

Of course, an interesting question is also whether the “new operational data” actually represents useful events that should be precessed as such en route to the EDW, for faster decisions…

VN:F [1.4.2_694]
Rating: 4.0/5 (1 vote cast)
  • Share/Save/Bookmark
Jan 07 2009

CEP as a “BI Megatrend”?

Sometime one has to take what one reads on the web with a liberal pinch of salt. I was amused to read on Intelligent Enterprise in the last few days that:

  1. CEP is a “BI Megatrend”
  2. CEP is a “marketing device”
  3. CEP to be provided by IBM’s latest (rule engine) technology acquisition - presumably either alongside, or replacing, their “Appsoft” (per the article) acquisition… [*1]

The first article is the most interesting: the BI megatrends also of interest to us in complex event processing are:

- BI users demanding richer experiences / exploration of relationships: this is where highly visual tools like TIBCO Spotfire play in historical data analysis, and TIBCO Syndera in operational intelligence…

- Business modeling meets MDM: frankly this is a stretch for traditional BI, but certainly MDM tools like TIBCO CIM have a semantic angle, although this angle is usually downstream of the development of the concept models defined in TIBCO BusinessEvents [*2] or existing operational data source schema. The same semantics affect ETL (per the article), which also cross-links to event processing technologies: one of the use cases for TIBCO BusinessEvents is “intelligent” ETL (a.k.a. event-driven rule-based transformations).

- Breaking the BI/DW Mold: of course there is nothing mouldy about data warehouses! But this is talking about in-memory and other “unconventional” data stores for doing “operational” BI, or operational intelligence. We already see these in the CEP world with high performance event stores as data grids.

- MapReduce meets large scale data analysis: MapReduce is more about highly parallel operations rather than more course-grained event pattern detection. But despite the difference in granularity, both MapReduce and TIBCO BusinessEvents are agent-based distributed systems for collaboratively solving large scale problems…

- Column Oriented databases: this could be a red herring, as these are still static-query-oriented databases. But is the world moving towards continuous queries against real-time event-driven information sources, or even tuple and object stores?

- Event Processing for analytics: this made some sense, although the authors seemed to want to tie event processing to BI and data warehouse systems, rather than the other way round. Surely BI’s role is to identify the event and data patterns for use in operational event-based decisions? So BI should help CEP solutions, not the other way round?

Maybe 2009 will be interesting after all!

Notes:

[1]  Why sell your customers 1 CEP tool when you can sell them 3 or 4? ;)

[2] TIBCO watchers may be interested to note that TIBCO CIM (MDM) and TIBCO BusinessEvents (CEP) are part of the same business unit …

VN:F [1.4.2_694]
Rating: 0.0/5 (0 votes cast)
  • Share/Save/Bookmark
Nov 24 2007

CEP as sauce for alphabet soup (Part 9): ETL

Extract, Transform and Load - sounds like a command from a dalek, but is really a whole subindustry of supporting acts for the data warehouse community (with some uses as well in integrating various operational database-oriented systems, usually via batch “catch-up” processes).

The rationale behind ETL is that one needs to get operational data from various operational databases (with schemae optimized for operational use) into a data warehouse for analysis / reporting / analytics (with a schema optimized for a “higher level” viewpoint). So you need to extract the data, run transformations on it (i.e. filters, aggregation queries, correlations etc) into the data format required for the data warehouse, and then do some (usually batch) load operation, in a way that minimizes impact on the operational system performance, and so that eventually the data warehouse users can start slicing and dicing the data…

So what relevance is CEP to this batch-world of uber-databases? Well, its the usual issue of real-time (responsiveness) versus batch (its ready when its ready) architectures and benefits. Its even hinted at in the Wikipedia article on ETL (as of the time of writing, anyway - Wiki content changes constantly) - excerpt follows…

Drawbacks to ETL
As the number of highly-connected computers in any data exchange grows, ETL suffers from exponentially increasing costs. See Metcalf’s Law. A solution to ETL cost growth is to use XML standards on an Enterprise Service Bus.

So, the CEP industry will say: instead of expensive and expansive batch operations on your data, why not treat the data in real-time as events?

Ah-ha! But surely CEP systems cannot handle the volumes or potential insights we are talking about? Well probably they can.

To partially prove the point, there are ETL users who are augmenting their toolkits with rule engines for complex transformations (indeed, the use of rule engines for complex systems integration goes back a long way, to at least the early 90s). And then quickly realizinging that the rule engine does all the transformations [*1] with all the integrations [*2] and performance [*3] they need. A rule-driven CEP engine lets them do this in “real-time”, too [*4]. And you can easily see where other CEP techniques can be used here (e.g. event stream processing).

Notes:
[*1] Transformations can be filtering for data quality based on content or metadata, aggregation / comparison across multiple sources, and so forth. In a rule engine these are carried out in memory: the data is loaded into the rule engine first and then transformed. For simpler cases a stateless rule engine will suffice here.

[*2] Most of the first-generation rule engines are designed to be good citizens and integrate with many standard data sources. New generation engines like TIBCO BusinessEvents can exploit EAI tools like TIBCO Adapters and literally feed off any source in the enterprise…

[*3] Rule engines are optimized for rule execution performance, which has a beneficial effect on rule-based transformations in ETL tasks…

[*4] A slightly different ETL fish is TIBCO DataExchange, that provides event-driven transformations on top of TIBCO BusinessWorks. Could be used to preprocess data before feeding to a CEP engine and/or a rules engine, too…

VN:F [1.4.2_694]
Rating: 0.0/5 (0 votes cast)
  • Share/Save/Bookmark