You have to set a threshold which defines the similarity value which is the lower boundary for creating associations. In this case all the values of the property are concatenated to one value and can then be compared by a token set comparator like DiceCoefficientComparator or JaccardIndexComparator. Therefore you can include an element attribute as follows: But there may be situations where you want to have a measure of the similarity covering all the values. If the profile has properties with multiple values by default all values are compared to each other and the pair with the highest similarity is returned. high: this value defines the probability that two items are equal if the comparator returns 1 (the properties of the two items are equal) if the comparator returns a value between 0.5 and 1 the resulting probability is scaled down accordingly.low: this value defines the probability that two items are equal if the comparator returns a value between 0 and 0.5 (the properties of the two items are not equal).comparator is the name of the comparator which is used to calculate the similarity of this property between the items (see above).name is the name of the element in the profile XML associated to the property.
#Similarity ratio calculator code#
The comparator returns 1 if the numbers are the same and 0 if the difference between the numbers is the same as diffThreshold or higher.įor each of the properties which should be considered in the calculation of the associations the following XML code has to be included in the config: You can use this comparator with the following XML config:
If you want the same comparator but differently configured for diffenent properties of the item you can create just as many object elements as you need, name them differently, and configure them according to the respective property.Īdditionally we implemented a new comparator for comparing numbers based on the difference between those numbers. The name element is used to reference the comparator later. Some comparators need parameters to configure them, they are provided with the param element as you can see in the example. Just replace NumericComparator from the example above with the comparator names from the list in the duke wiki to use them. You can use the comparators by adding the following XML fragment into the configuration: Take a look at the comparator entry in the duke wiki for a list of comparators. Or you can use the Levenshtein comparator which calculates the Levenshtein distance between the two strings and returns the value scaled between 0 and 1. E.g., if you want to compare two string you can use the ExaxtComparator which returns 1 if the strings are equal and 0 otherwise. You can define a comparator for each of the properties. Look at the next paragraph for more details.ĭuke uses comparators to compare the properties of the items and calculates a value between 0 and 1 where 0 means absolutely not similar and 1 means similar.
3.2.2 Configuration for All Properties (Include Only Match Information).3.2.1 Configuration for All Properties (Include Match and Non-Match Information).