In addition to the published schema and artifacts, the ECS repo also contains tools to generate artifacts based on the current published and custom schemas.
You may be asking if ECS is a specification for storing event data, where does the ECS tooling fit into the picture? As users implement ECS into their Elastic stack, common questions arise:
Users can use the ECS tools to tackle both problems. What artifacts are relevant will also vary based on need. Many users will find the Elasticsearch templates most useful, but Beats contributors will instead find the Beats-formatted YAML field definition files valuable. By maintaining only their customizations and use the tools provided by ECS, they can generate relevant artifacts for their unique set of data sources.
NOTE - These tools and their functionality are considered experimental.
Before diving into the details, here’s a complete example that:
python scripts/generator.py --ref v1.6.0 \
--subset ../my-project/fields/subset.yml \
--include ../my-project/fields/custom/ \
--out ../my-project/ \
--template-settings ../my-project/fields/template-settings.json \
--mapping-settings ../my-project/fields/mapping-settings.json
The generated Elasticsearch template would be output at
my-project/generated/elasticsearch/7/template.json
If this sounds interesting, read on to learn all about each of these settings.
See usage-example/ for a complete example with source files.
Term | Definition |
---|---|
ECS | Elastic Common Schema. For the purposes of this guide, ECS may refer to either the schema itself or the repo/tooling used to maintain the schema |
artifacts | Various kinds of files or programs that can be generated based on ECS |
field set | Groups of related fields in ECS |
schema | Another term for a group of related fields in ECS. Used interchangeably with field set |
schema definition | The markup used to define a schema in ECS |
attributes | The properties of a field or field set that are used to define that field or field set in a schema definition |
The recommended way to download the ECS repo is git clone
:
$ git clone https://github.com/elastic/ecs
$ cd ecs
Prior to installing dependencies or running the tools, it’s recommended to check out the git
branch for the ECS version being targeted.
Example: For ECS 1.5.0
:
$ git checkout v1.5.0
Setting up a virtualenv
(venv
) can be accomplished by running make ve
the top-level of the ECS repo:
$ make ve
All necessary Python dependencies will also be installed with pip
.
You can use the Python and dependencies from this isolated virtual environment
by using build/ve/bin/python
instead of python
in the examples shown here.
Install dependencies using pip
(An active virutalenv
is recommended):
$ pip install -r scripts/requirements.txt
Using the defaults, the generator script generates the artifacts based on the current ECS schema.
$ python scripts/generator.py
Loading schemas from local files
Running generator. ECS version 1.5.0
Points to note on the defaults:
generated
directory and the entire schema is includeddocs
directory. More specifics on generated doc files is covered in the contributor’s filegenerated
directoryversion
displayed when running generator.py
is based on the current value of the version file in the top-level of the repoThe generator’s defaults are how the ECS team maintains the official artifacts published in the repo. For your own use cases, you may wish to add your own fields or remove others that are unused. The following section details the available options for controlling the output of those artifacts.
Generate the ECS artifacts in a different output directory. If the specified directory doesn’t exist, it will be created:
$ python scripts/generator.py --out ../myproject/ecs/out/
Inside the directory passed in as the target dir to the --out
flag, two directories, generated
and docs
, will be created. docs
will contain three asciidoc files based on the contents of the provided schema. generated
will contain the various artifacts laid out as in the published repo (beats
, csv
, ecs
, elasticsearch
).
Note: When running using either the
--subset
or--include
options, the asciidoc files will not be generated.
Use the --include
flag to generate ECS artifacts based on the current ECS schema field definitions plus provided custom fields:
$ python scripts/generator.py --include ../myproject/ecs/custom-fields/
The --include
flag expects a directory of schema YAML files using the same file format as the ECS schema files. This is useful for maintaining custom field definitions that are outside of the ECS schema, but allows for merging the custom fields with the official ECS fields for your deployment.
For example, if we defined the following schema definition in a file named myproject/ecs/custom-fields/widget.yml
:
---
- name: widgets
title: Widgets
group: 2
short: Fields describing widgets
description: >
The widget fields describe a widget and all its widget-related details.
type: group
fields:
- name: id
level: extended
type: keyword
short: Unique identifier of the widget
description: >
Unique identifier of the widget.
Multiple directory targets can also be provided:
$ python scripts/generator.py \
--include ../myproject/custom-fields-A/ ../myproject/custom-fields-B \
--out ../myproject/out/
Generate artifacts using --include
to load our custom definitions in addition to --out
to place them in the desired output directory:
$ python scripts/generator.py --include ../myproject/custom-fields/ --out ../myproject/out/
Loading schemas from local files
Running generator. ECS version 1.5.0
Loading user defined schemas: ['../myproject/custom-fields/']
We see the artifacts were generated successfully:
$ ls -lah ../myproject/out/
total 0
drwxr-xr-x 2 user ecs 64B Jul 8 13:12 docs
drwxr-xr-x 6 user ecs 192B Jul 8 13:12 generated
And looking at a specific artifact, ../myprojects/out/generated/elasticsearch/7/template.json
, we see our custom fields are included:
...
"widgets": {
"properties": {
"id": {
"ignore_above": 1024,
"type": "keyword"
}
}
}
...
Include can be used together with the --ref
flag to merge custom fields into a targeted ECS version. See Ref
.
NOTE: The
--include
mechanism will not validate custom YAML files prior to merging. This allows for modifying existing ECS fields in a custom schema without having to redefine all the mandatory field attributes.
If your indices will never populate particular ECS fields, there’s no need to include those field definitions in your index mappings. The --subset
argument allows for passing a subset definition YAML file which indicates which field sets or specific fields to include in the generated artifacts.
$ python scripts/generator.py --subset ../myproject/subsets/subset.yml
Example subset file:
---
name: malware_event
fields:
base:
fields:
"@timestamp": {}
agent:
fields: "*"
dll:
fields: "*"
ecs:
fields: "*"
The subset file has a defined format, starting with the two top-level required fields:
name
: The name of the subset. Also used to name the directory holding the generated subset intermediate files (e.g. <outputTarget>/generated/ecs/subset/<name>
)fields
Contains the subset field filtersThe fields
object declares which fields to include:
fields
by their top-level name (e.g. base
, agent
, etc.)fields: "*"
@timestamp: {}
Reviewing the above example, the generator using subset will output artifacts containing:
@timestamp
field from the base
field setagent.*
fields, dll.*
, and ecs.*
fieldsIt’s also possible to combine --include
and --subset
together! Do note that your subset YAML filter file will need to list any custom fields being passed with --include
. Otherwise, --subset
will filter those fields out.
The --ref
argument allows for passing a specific git
tag (e.g. v1.5.0
) or commit hash (1454f8b
) that will be used to build ECS artifacts.
$ python scripts/generator.py --ref v1.5.0
The --ref
argument loads field definitions from the specified git reference (branch, tag, etc.) from directories ./schemas
and ./experimental/schemas
(when specified via --include
).
Here’s another example loading both ECS fields and experimental changes from branch “1.7”, then adds custom fields on top.
$ python scripts/generator.py --ref 1.7 --include experimental/schemas ../myproject/fields/custom --out ../myproject/out
The command above will produce artifacts based on:
../myproject/fields/custom
as they are on the filesystemNote:
--ref
does have a dependency ongit
being installed and all expected commits/tags fetched from the ECS upstream repo. This will unlikely be an issue unless you downloaded the ECS as a zip archive from GitHub vs. cloning it.
The --template-settings
and --mapping-settings
arguments allow overriding the default template and mapping settings, respectively, in the generated Elasticsearch template artifacts. Both artifacts expect a JSON file which contains custom settings defined.
$ python scripts/generator.py --template-settings ../myproject/es-overrides/template.json --mapping-settings ../myproject/es-overrides/mappings.json
The --template-settings
argument defines index level settings that will be applied to the index template in the generated artifacts. This is an example template.json
to be passed with --template-setting
:
{
"index_patterns": ["mylog-*"],
"order": 1,
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": 10000
}
},
"refresh_interval": "1s"
}
},
"mappings": {}
}
--mapping-settings
works in the same way except now with the mapping settings for the index. This is an example mapping.json
file:
{
"_meta": {
"version": "1.5.0"
},
"date_detection": false,
"dynamic_templates": [
{
"strings_as_keyword": {
"mapping": {
"ignore_above": 1024,
"type": "keyword"
},
"match_mapping_type": "string"
}
}
],
"properties": {}
}
For template.json
, the mappings
object is left empty: {}
. Likewise the properties
object remains empty in the mapping.json
example. This will be filled in automatically by the script.
IMPORTANT: This feature is unnecessary for most users. Our default free distribution comes with the Elastic Basic license, and supports all data types used by ECS. Learn more about our licenses here.
Users that want to use the open source version of Elasticsearch do not have access to the basic data types. However some of these types have an OSS replacement that can be used instead, without too much loss of functionality.
This flag performs a best effort fallback, replacing basic data types with their OSS replacement.
Indices using purely OSS types will benefit from the normalization of ECS, but may be missing on some of the added functionality of these basic types.
Current fallbacks applied by this flag are:
constant_keyword
=> keyword
wildcard
=> keyword
version
=> keyword
Usage:
$ python scripts/generator.py --oss
The --strict
argument enables “strict mode”. Strict mode performs a stricter validation step against the schema’s contents.
Basic usage:
$ python scripts/generator.py --strict
Strict mode requires the following conditions, else the script exits on an exception:
The current artifacts generated and published in the ECS repo will always be created using strict mode. However, older ECS versions (pre v1.5.0
) will cause
an exception if attempting to generate them using --strict
. This is due to schema validation checks introduced after that version was released.
Example:
$ python scripts/generator.py --ref v1.4.0 --strict
Loading schemas from git ref v1.4.0
Running generator. ECS version 1.4.0
...
ValueError: Short descriptions must be single line, and under 120 characters (current length: 134).
Offending field or field set: number
Short description:
Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet.
Removing --strict
will display a warning message, but the script will finish its run successfully:
$ python scripts/generator.py --ref v1.4.0
Loading schemas from git ref v1.4.0
Running generator. ECS version 1.4.0
/Users/ericbeahan/dev/ecs/scripts/generators/ecs_helpers.py:176: UserWarning: Short descriptions must be single line, and under 120 characters (current length: 134).
Offending field or field set: number
Short description:
Unique number allocated to the autonomous system. The autonomous system number (ASN) uniquely identifies each network on the Internet.
This will cause an exception when running in strict mode.
The --intermediate-only
argument is used for debugging purposes. It only generates the “intermediate files”, ecs_flat.yml
and ecs_nested.yml
, without generating the rest of the artifacts.
More information on the different intermediate files can be found in the generated directory’s README.