Structure-Preserving Semantic Matching Algorithm

Peers in the OpenKnowledge system are able not only to have their own ontologies but also their own methods for performing and calling actions. The actions required in a particular IM are expressed in the
constraints, and any peer playing a role must be able to satisfy the constraints on that role. The job of the matching process is to map the constraints on a role to the methods of a peer which wants to perform that role, and to return a numerical score to indicate how similar these two are. This score can be used by peers to determine whether or not they wish to play that role and can be declared in order to allow other peers to determine whether they wish to play with that peer playing that role.

Matching a constraint in an IM to an ability of a peer is a two-step process:

  • the vocabulary of the two terms are matched;
  • the structure of the terms is matched.

For example, if a constraint buy(vehicle,price) were to be mapped to a peer ability purchase(price,car,number):

  • the vocabulary matcher would identify that buy and purchase and price and price were equivalent, that vehicle was a super-class of car and that number had no relation to any word in the first term;
  • the structure matcher would then use these matches to determine how the terms were overall related, through considering the structure of those terms.

The problem of ontology mismatch has been very widely studied, but this is almost always related only to first part of these tasks: mismatches between structured terms is not considered. Likewise, there has been research into structural matching, most significantly through the tree-edit distance algorithm, but this does not usually consider the semantics of the terms in the tree. The Structure-preserving Semantic Matching (SPSM) algorithm is therefore an original attempt to combine these two tasks.

SPSM uses S-Match to perform the first step of the process. The results of this then feed in to the structure-matching step. The structure-matching step is based on Giunchiglia and Walsh's Theory of
Abstraction and the tree-edit distance algorithm. The theory of abstraction describes all the ways in which a first-order term can be a more general version of another term: through having one fewer arguments, through having a predicate name that is more general or through having one or more arguments of a more general type. This theory can be inverted to form refinements, which detail how a first-order term can be a more specialised version of another term.

The tree-edit distance algorithm determines how two trees can be mapped into one another through the use of three operators on nodes: add, delete and relabel, but does not consider the semantic impact of performing these operations. SPSM uses the theory of abstraction to assign a cost to these operations, reflecting the effect they have on the semantics of the term. For example, the cost of relabelling where the new and the old labels are semantically equivalent is always zero; the cost of relabelling where there is no link between the labels is usually infinite and the cost of relabelling where one of the labels is a subclass of the others is usually 1. However, these costs can be altered according to the user requirements.

We have evaluated the SPSM algorithm with promising results.

The technical details of this process, together with some evaluation and further explanation of the use of SPSM within OpenKnowledge and as its role in service integration, can be found in the paper Service Integration through Structure-Preserving Semantic Matching. A thorough account of the evaluation of matching can be found in Deliverable 3.7.

Egglue Powered