3. Usage Pattern
Create an index.
Define a mapping for fields that will contain person, location, or organization names. The type for each of these fields is
"rni_name".Create documents that contain one or more name fields along with other fields of interest. Each name field in a document will contain a name.
Query the index.
The following snippets use the cURL [http:/curl.haxx.se/] command-line tool to illustrate the Elasticsearch API for running the RNI-Elasticsearch plugin.
3.1 Creating an Index
The following cURL statement creates an index named rni-test.
curl -XPUT 'http://localhost:9200/rni-test'
3.2 Define a Mapping
Specify a document type for the documents you plan to create, and set the "type" for name fields to "rni_name".
The following statement maps the "primary_name" and "aka" (also known as) fields in the "record" document to the "rni_name" type in the "rni-test" index.
curl -XPUT 'http://localhost:9200/rni-test/record/_mapping' -d '{
"record" : {
"properties" : {
"primary_name" : { "type" : "rni_name" },
"aka" : { "type" : "rni_name" },
"occupation" : { "type" : "string" }
}
}
}'
3.2.4 Something new
Here is the new section.
I am adding a new paragraph. here is code.
more code
Optimization for single-valued "rni_name" fields: If, for example, you know that the "primary_name" field never contains more than a single value, you
can minimize overhhead by turning off the "rni-multivalued" property, which is true by default.
"primary_name" : { "type" : "rni_name", "rni_multivalued" : false }
Section for Itai and Brian
Here is some text. inline code
a bit of code
3.3 Creating Documents
You may include document fields other than name fields.
curl -XPUT 'http://localhost:9200/rni-test/record/1' -d '{
"primary_name" : "Joe Schmoe",
"aka" : "Bossman",
"occupation" : "business owner"
}'
For the name fields, you can include individual properties in place of just a name string. Entity type is particularly useful.
| Property | Required | Description |
|---|---|---|
"data" |
√ | The name string. |
"language" |
ISO 639-3 Code for the language of use: the language of the document in which the name was found. | |
"languageOfOrigin" |
ISO 639-3 Code for the language of origin of the name. For example, a name of Spanish origin (spa) may be found in an English (eng) document. | |
"script" |
ISO 15924 code for the script: the script for all languages supported in this release is "Latn". | |
"entityType" |
"PERSON", "LOCATION", or "ORGANIZATION". | |
"uid" |
Unique string identifier for the document. |
Example:
curl 'http://localhost:9200/rni-test/record/3' -d '{
"primary_name" : {
"data" : "Joe Schmoe",
"language" : "eng",
"script" : "Latn",
"entityType" : "PERSON"
}
}'
3.4 Query the Index
The query for a name consists of two parts.
3.4.1 Base Query
The base query is a standard query against a name field:
"query" : {
"match" : {
"primary_name" : "Jo Shmoe"
}
}
Querying supports the same name properties that you may use when indexing documents. Unlike during document creation, you must wrap the name fields in a single JSON object.
curl 'http://localhost:9200/rni-test/record/_search' -d '{
"query" : {
"match" : {
"primary_name" : "{\"data\" : \"Jo Shmoe\", \"language\" : \"eng\", \"entityType\" : \"PERSON\"}"
}
}
}'
Much like during indexing, RNI creates a set of keys based on the name and then generates a more complex internal query to match against the indexed keys.
Base Query Against a Multivalued Name Field. If the name field you are querying contains multiple values, use a nested query. The following query assumes that the "name" field may contain multiple values:
"query" : {
"nested" : {
"path" : "name",
"query" : {
"match" : {
"name" : "Joe Shmoe"
}
}
}
}
3.4.2 Rescoring with the RNI Pairwise Name Match
The second part of the query uses Elasticsearch Rescoring [http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-rescore.html] to ensure that only good candidates are passed to the RNI pairwise matcher, which is a computationally intensive process.
Rescoring uses the following parameters:
window_size(an integer, defaults to 10) specifies how many documents from the base query should be passed to the RNI pairwise matcher.Use this parameter to limit the number of compute-intensive name matches that need to be performed, thus decreasing query latency.
query_weight(a float, defaults to 1.0) specifies the weighting of the score returned by the base query.In the context of RNI pairwise matching, the base query score has little meaning, so we suggest you set it to 0.0.
rescore_query_weight(a float, defaults to 1.0) specifies the weighting of the maximum RNI pairwise match score.If
query_weight0.0 andrescore_query_weightis 1.0, the score that is returned by rescoring is the RNI pairwise match score.
In the following example, pairwise matching is performed on the top 200 names returned by the base query.
"rescore" : {
"window_size" : 200,
"query" : {
"rescore_query" : {
"function_score" : {
"name_score" : {
"field" : "primary_name",
"query_name" : "Jo Shmoe"
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
The "name_score" function matches every name in the given field against the query name and returns the maximum score to the rescorer.
The "name_score" function score query must be given at least one object that specifies:
- field: the field of type "rni_name" to match against
- query: the query name
It also supports all of the name properties mentioned previously.
This example illustrates the full query incorporating both match and rescore.
curl 'http://localhost:9200/rni-test/record/_search' -d '{
"query" : {
"match" : {
"primary_name" : "Joe Shmoe"
}
},
"rescore" : {
"window_size" : 200,
"query" : {
"rescore_query" : {
"function_score" : {
"name_score" : {
"field" : "primary_name",
"query_name" : "Jo Shmoe"
}
}
}
},
"query_weight" : 0.0,
"rescore_query_weight" : 1.0
}
}'
This query returns an RNI score of 0.6832789 against "Joe Shmoe":
{
"_index": "rni-test",
"_type": "record",
"_id": "1",
"_score": 0.6832789,
"_source": {
"primary_name": "Joe Shmoe",
"aka": "Bossman",
"occupation": "business owner"
}
}