Tuesday, October 1, 2013

Smart Data Integration - Solving the data lineage problem with semantic technology

Provenance and lineage. Two wonderful words, used interchangeably, to describe a sticky problem for most large financial institutions. That is, what is the origin, meaning and quality of my data? These questions are becoming increasingly important as data is sourced from more disparate locations, regulators demand an audit trail for reporting and more data is exposed to internal and external consumers.

Traditional data integration approaches typically focus on moving data point-to-point and do a poor job of tracking the full lifecycle of data. Cambridge Semantics is at the forefront of a new approach to enterprise data integration that solves these problems using semantic technologies. We call this Smart Data Integration. 

By deploying a semantic layer across existing infrastructure, you can build a full picture of your information landscape and lifecycle while preserving your existing infrastructure investments. In addition, you can achieve other critical benefits:

  • Dramatically lower the time and cost to onboard new customers and data sources
  • Support industry standard, business consumable, operationally agile enterprise data models (e.g.., FIBO)
  • Put highly interactive, business friendly data consumption in the hands of business users
  • Expose full enterprise-wide data provenance necessary for business and regulatory reporting

Cambridge Semantics is developing a set of tools on our semantic platform Anzo to deliver Smart Data Integration:

·         Business Analyst Mapping Tool
The mapping tool allows a business analyst to connect to source and target data systems, ingest schemas and review sample data. Using a familiar Excel based interface, the BA can map source to target fields and capture any required transformations using context sensitive wizards.

·         Business Conceptual Model
During the mapping process, the BA has the option to map the source data to a target conceptual model, for example, the Financial Industry Business Ontology (FIBO).

·         Automatic ETL Generation
Once the mapping process is complete, the map is saved for cataloging and reuse. At this point, the BA can also click a button to automatically create an ETL job for their tool of choice (e.g., Pentaho Kettle, Talend, Informatica etc.). The ETL job is created from the mapping without any coding or manual intervention.

·         Analyst Dashboards
The BAs have full access to the target data and conceptual model through Anzo’s web dashboards. They can search on fields and get data provenance visualizations that show where data came from and what transformations were performed on it.

·         Business User Dashboards
Business users also have full access to the target data and conceptual model through Anzo’s web dashboards. This provide interactive data search, visualization and investigation capabilities.

To learn more about Smart Data Integration, contact me at marty@cambridgesemantics.com or join our webinar on October 10th, 2013 at 2pm for an overview and demo of Smart Data Integration - Semantic-model driven Enterprise Data Integration and Data Governance

https://attendee.gotowebinar.com/register/8791073005097461249

Monday, June 3, 2013

Semantic Fix for SOA Chaos

Image courtesy of IBM
Large organizations typically require hundreds of integrations between disparate legacy systems to meet enterprise business requirements. Even with a SOA approach, this complex web of point to point integrations can be difficult to govern and manage. The result is proliferation of overlapping services, poor documentation, limited reuse and a challenging support environment.

Surprisingly, there is little vendor support to solve this problem. Most of the major vendors are focused on registry and repository products that provide solid run-time governance but provide little support for design time. A fresh approach to this challenge is emerging based on proven best practices and semantic tools.

The approach begins with a canonical model - a universal way to represent enterprise data. Services map to and from the canonical model, eliminating point-to-point dependencies and isolating changes to a single link.

Implementing the canonical model with semantic technology is an excellent match for the dynamic and demanding challenges of a large SOA environment. The agility of the underlying graph model allows rapid, controlled change to meet user demands while preserving enterprise needs for governance and control.

Other critical components include tools for mapping services, authoring interface specifications and reporting on key  metrics. Using semantic tools with the canonical model provides:
          • Flexible reporting on dependencies, reuse and other critical metrics
          • Controlled process and workflow for authoring interface documents
          • Robust, granular, field level mapping
          • Versioned repository of service documentation
The benefits of this approach are significant. The mapping process and document authoring workflow reduce input errors and streamline approvals. The well documented service environment promotes reuse and reduces service duplication. Together, these changes combine to dramatically shrink development time and cost. Ultimately, a cleaner, well documented environment makes support easier, faster and cheaper.

If you are planning a large SOA deployment or if you are struggling with an existing one, the semantic approach is well worth a look.

Contact me on Twitter @mloughlin to discuss...








Tuesday, May 21, 2013

What is Semantic Technology?

With Gartner and Forrester both identifying semantic technology as a key trend to watch in 2013, the first question many folks are asking is, "What is semantic technology?" My goal here is to answer that question, not in a rigorous academic or historical way, but in the way I answer my family and friends over dinner.

I work for a company that has a product (Anzo) built on semantic web standards so I usually start there. Anzo allows our customers to combine information from widely different sources and formats into to an environment where it can be viewed and analyzed. Okay, you say, no magic there, many software products do similar things. So, we need to dig a little deeper to understand the value:
  • The information sources can be really varied: from Google News to Twitter to enterprise software to big-data systems
  • When we say "combine", we don't just mean in the same place, we mean combined into the same conceptual model, regardless of the source - linked together based on common concepts
  • The information is represented in human understandable form - the concepts and relationships are ones we all use every day (subject, predicate, object - a car has an engine)
  • Through the model, all of the information is available in an intuitive way to searched, analyzed and visualized - we can ask questions we did not plan for up front
  • When we want to add a new source of information we just add it and it becomes part of the existing model, without the traditional weeks of design or coding 
The semantic data model makes all of this possible - it is constructed of simple, human understandable"sentences" - subject, predicate, object. By linking these sentences together, we can create a conceptual model. But, this is not just the conceptual model, it is also the way the data is stored.

So, at its core, semantic technology is a very simple but very powerful concept. At Enterprise Data World, one attendee called our demonstration "magic"! While it is most definitely not magic, the power and flexibility of conceptually linking data from disparate sources must be seen to be believed. This is truly breakthrough technology with significant implications for large enterprises.


For more information, please visit www.cambridgesemantics.com or contact me at marty@cambridgesemantics.com
@mloughlin
View Marty Loughlin's profile on LinkedIn







Friday, May 17, 2013

DATA Demo Day May 16th, 2013

Rayburn House Office Building
Rayburn House, Washington, DC

On Thursday May 16th, Cambridge Semantics (CSI) participated in DATA demo day in Washington, DC. The event, hosted by House majority leader' Eric Cantor, was an opportunity for leading technology vendors to demonstrate how their products could leverage the data standards proposed in the DATA Act to make government spending more transparent and to identify waste and fraud. As a member of the DATA Transparency Coalition, Cambridge Semantics was invited to show how our unified information access software, Anzo, could help with this challenge.

At CSI,  we work with large pharmaceutical and financial enterprises to help them better manage and leverage their data. Anzo is a data integration platform, based on semantic technology, that is very good at combining data from disparate sources into a unified, business consumable model over which you can search, run analytics and build visualizations.

For DATA demo day, we loaded information from recovery.gov and the System for Award Management (SAM) into our platform to demonstrate some key capabilities:
  • Ability to easily link together data from very different sources
  • Present data in models and terms familiar to business users
  • Fully interactive search, analysis and visualization of the data
  • Examples of how to use the combined data to identify fraud
Recovery.gov contains information about awards by contractor. The SAM data includes information about the contractors such as number of employees. By linking these data sets, in this case by contractor name, we are able to ask some interesting questions. A simple example is looking at the dollar value of awards in relation to the size of the contractor. We can easily highlight the outlying cases of small contractors receiving very big awards. While not necessarily an indication of fraud, the extreme cases are worthy of further investigation.

If the DATA Act is adopted, government spending information will be tagged and made available for public consumption in machine readable formats. Combined with solutions like CSI's semantic platform, this will enable aggregation of spend across agencies and make deep, interactive analysis of the combined data widely accessible.

For more information, please visit www.cambridgesemantics.com or email me at marty@cambridgesemantics.com
@mloughlin
View Marty Loughlin's profile on LinkedIn