Download OpenAPI specification:Download
This API acts as a create/read/update/delete interface for anything related to documents.
All the request bodies in this API are JSON encoded and their content-type
header should be set to application/json
.
As a document we consider a cohesive text, for example a complete news article. It consists just of a unique id, a text snippet or a file, optional properties and optional tags. The text snippet is ideally a short, meaningful representation of the larger document, reduced to just one paragraph. In place of the text snippet it is possible to upload a file. The system will extract the text in the file and use that as the content of the document.
The document id is a unique identifier for a single document.
A snippet is a reduced representation of a larger text, for example if the document would be a news article, then its text would be the news article in a pure textual format. In order for our system to work correctly, it is important that the snippet is just the size of one or two paragraphs, and that the snippet's text clearly summarises the larger text. For example, let's take a news article which talks about the effects of inflation. Our snippet could then be: "Inflation worries as prices keep rising. People having budgetary difficulties as a result. Government pressured to take action.". A bad snippet would for example be just taking the very first paragraph of a document, which could sometimes work, but could also lead to: "December 20th 2020. Article written by Jane Doe. On this bright and sunny day, people might forget about their worries sometimes."
Properties are optional data for documents, which are usually needed to properly show the document back to the user, when it returns as a personalized document.
If for example, you'd wish to integrate a type of carousel view, listing a total of 10 personalized documents in a "for you"-section, then you might choose to display each document as an image and title, with a url to link the user to when pressed.
For this, you would need three document properties: image
, link
and title
.
Tags are optional data for documents, which are used to improve the scoring in document searches. Each document can have multiple tags. For example, tags can be categories which the documents can be assigned to.
Upsert documents to the system, which creates a representation of the document that will be used to match it against the preferences of a user.
Important note: The maximum size for a request is 10Mb. This means that if you have big documents you would not be able to fill the request to the maximum batch size.
Important note: If a document id appears multiple times, only the last document with that id is retained.
required | Array of objects (IngestedDocument) [ 1 .. 100 ] items | ||||||||||||||||||||||||
Array ([ 1 .. 100 ] items)
|
{- "documents": [
- {
- "id": "document_1",
- "snippet": "lorem ipsum delores",
- "properties": {
- "is_blue": true
}, - "tags": [
- "news",
- "tech"
], - "is_candidate": false
}, - {
- "id": "document_2",
- "snippet": "more lorem less ipsum",
- "tags": [
- "exclusive"
], - "default_is_candidate": false
}, - {
- "id": "document_3",
- "snippet": "quite a lot of lines of lorem ipsum delores",
- "summarize": true
}
]
}
{ }
Delete all listed documents.
documents required | Array of strings (Id) [ 1 .. 1000 ] items [ items [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ ] An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "documents": [
- "id1"
]
}
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Delete the listed document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Set the documents considered for recommendations.
required | Array of objects (DocumentCandidate) >= 0 items | ||
Array (>= 0 items)
|
{- "documents": [
- {
- "id": "id1"
}
]
}
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Get all the properties of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "properties": {
- "title": "News title"
}
}
Set or replace all the properties of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
required | object (DocumentProperties) Mostly arbitrary properties that can be attached to a document, up to 2.5KB in size.
A key must be a valid | ||||||
|
{- "properties": {
- "publication_date": "2019-08-24T14:15:22Z",
- "document property id1": { },
- "document property id2": { }
}
}
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Delete all the properties of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Get the property of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
property_id required | string (IdNoDot) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@][a-zA-Z0-9\-:@_]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "property": "Any valid json value"
}
Set or replace the property of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
property_id required | string (IdNoDot) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@][a-zA-Z0-9\-:@_]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s or underscores (except as the first character). The length constraints are in bytes, not characters. |
required | null or boolean or number or string or Array of strings or string (DocumentProperty) |
One of null |
{- "property": { }
}
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Delete the property of the document.
document_id required | string (Id) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@.][a-zA-Z0-9\-:@._]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s, dots or underscores (except as the first character). The length constraints are in bytes, not characters. |
property_id required | string (IdNoDot) [ 1 .. 256 ] characters ^[a-zA-Z0-9\-:@][a-zA-Z0-9\-:@_]*$ Example: id1 An id can be any non-empty string that consist of arabic digits, latin letters, hyphens, colons, @s or underscores (except as the first character). The length constraints are in bytes, not characters. |
{- "request_id": "string",
- "kind": "string",
- "details": { }
}
Add additional indexed properties to the schema.
The schema can have at most 11 properties in total, including the
automatically created publication_date
property.
If you plan to create multiple indexed properties, it is strongly recommended to do so with one request.
For now it is not possible to modify or delete indexed properties through the API.
To use a property with query filters it is necessary to once add it to the list of indexed properties using this endpoint.
Newly ingested documents are checked to be compatible with the indexed
property schema, i.e. if they have a property in the schema it's value must
be compatible (same type, in case of date
a string in rfc3339 date time format).
Due to technical limitation existing documents are not checked to be compatible with the new indexed properties added with this request. Incompatible documents will instead be treated as if they didn't had that property wrt. the filter/index. Besides that existing documents with matching properties are added to the index in a background job. Functionality to check the completion of that job is not yet implemented.
required | object (IndexedPropertiesSchema) A mapping of document property ids to indexed property definitions. Be aware that the keys of the object must be valid | ||||||
|
{- "properties": {
- "foo": {
- "type": "keyword"
}, - "bar": {
- "type": "date"
}
}
}
{- "properties": {
- "foo": {
- "type": "keyword"
}, - "bar": {
- "type": "date"
}, - "publication_date": {
- "type": "date"
}
}
}