Bots Documentation

  1. Collectors
  2. Parsers
  3. Experts
  4. Outputs

General remarks

By default all of the bots are started when you start the whole botnet, however there is a possibility to disable a bot. This means that the bot will not start every time you start the botnet, but you can start and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your runtime.conf: "enabled": false. Be aware that this is not a normal parameter (like the others described in this file). It is set outside of the parameters object in runtime.conf. Check the User-Guide for an example.

There are two different types of parameters: The initialization parameters are need to start the bot. The runtime parameters are needed by the bot itself during runtime.

The initialization parameters are in the first level, the runtime parameters live in the parameters sub-dictionary:

{
    "bot-id": {
        "parameters": {
            runtime parameters...
        },
        initialization parameters...
    }
}

For example:

{
    "abusech-feodo-domains-collector": {
        "parameters": {
            "provider": "Abuse.ch",
            "feed": "Abuse.ch Feodo Domains",
            "http_url": "http://example.org/feodo-domains.txt"
        },
        "name": "Generic URL Fetcher",
        "group": "Collector",
        "module": "intelmq.bots.collectors.http.collector_http",
        "description": "collect report messages from remote hosts using http protocol",
        "enabled": true,
        "run_mode": "scheduled"
    }
}

This configuration resides in the file runtime.conf in your intelmq's configuration directory for each configured bot.

Initialization parameters

  • name and description: The name and description of the bot as can be found in BOTS-file, not used by the bot itself.
  • group: Can be "Collector", "Parser", "Expert" or "Output". Only used for visualization by other tools.
  • module: The executable (should be in $PATH) which will be started.
  • enabled: If the parameter is set to true (which is NOT the default value if it is missing as a protection) the bot will start when the botnet is started (intelmqctl start). If the parameter was set to false, the Bot will not be started by intelmqctl start, however you can run the bot independently using intelmqctl start <bot_id>. Check the User-Guide for more details.
  • run_mode: There are two run modes, "continuous" (default run mode) or "scheduled". In the first case, the bot will be running forever until stopped or exits because of errors (depending on configuration). In the latter case, the bot will stop after one successful run. This is especially useful when scheduling bots via cron or systemd. Default is continuous. Check the User-Guide for more details.

Collectors

Feed parameters: Common configuration options for all collectors

  • feed: Name for the feed.
  • accuracy: Accuracy for the data of the feed.
  • code: Code for the feed.
  • documentation: Link to documentation for the feed.
  • provider: Name of the provider of the feed.
  • rate_limit: time interval (in seconds) between messages processing.

HTTP parameters: Common URL fetching parameters used in multiple collectors

  • http_timeout_sec: A tuple of floats or only one float describing the timeout of the http connection. Can be a tuple of two floats (read and connect timeout) or just one float (applies for both timeouts). The default is 30 seconds in default.conf, if not given no timeout is used. See also https://requests.readthedocs.io/en/master/user/advanced/#timeouts
  • http_timeout_max_tries: An integer depciting how often a connection is retried, when a timeout occured. Defaults to 3 in default.conf.
  • http_username: username for basic authentication.
  • http_password: password for basic authentication.
  • http_proxy: proxy to use for http
  • https_proxy: proxy to use for https
  • http_user_agent: user agent to use for the request.
  • http_verify_cert: path to trusted CA bundle or directory, false to ignore verifying SSL certificates, or true (default) to verify SSL certificates
  • ssl_client_certificate: SSL client certificate to use.
  • http_header: HTTP request headers

Generic URL Fetcher

Information:

  • name: intelmq.bots.collectors.http.collector_http
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect report messages from remote hosts using http protocol

Configuration Parameters:

  • Feed parameters (see above)
  • HTTP parameters (see above)
  • http_url: location of information resource (e.g. https://feodotracker.abuse.ch/blocklist/?download=domainblocklist)

Generic URL Stream Fetcher

Information:

  • name: intelmq.bots.collectors.http.collector_http_stream
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: Opens a streaming connection to the URL and sends the received lines.

Configuration Parameters:

  • Feed parameters (see above)
  • HTTP parameters (see above)
  • strip_lines: boolean, if single lines should be stripped (removing whitespace from the beginning and the end of the line)

If the stream is interrupted, the connection will be aborted using the timeout parameter. Then, an error will be thrown and rate_limit applies if not null. The parameter http_timeout_max_tries is of no use in this collector.


Generic Mail URL Fetcher

Information:

  • name: intelmq.bots.collectors.mail.collector_mail_url
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect messages from mailboxes, extract URLs from that messages and download the report messages from the URLs.

Configuration Parameters:

  • Feed parameters (see above)
  • HTTP parameters (see above)
  • mail_host: FQDN or IP of mail server
  • mail_user: user account of the email account
  • mail_password: password associated with the user account
  • mail_ssl: whether the mail account uses SSL (default: true)
  • folder: folder in which to look for mails (default: INBOX)
  • subject_regex: regular expression to look for a subject
  • url_regex: regular expression of the feed URL to search for in the mail body

Generic Mail Attachment Fetcher

Information:

  • name: intelmq.bots.collectors.mail.collector_mail_attach
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect messages from mailboxes, download the report messages from the attachments.

Configuration Parameters:

  • Feed parameters (see above)
  • mail_host: FQDN or IP of mail server
  • mail_user: user account of the email account
  • mail_password: password associated with the user account
  • mail_ssl: whether the mail account uses SSL (default: true)
  • folder: folder in which to look for mails (default: INBOX)
  • subject_regex: regular expression to look for a subject
  • attach_regex: regular expression of the name of the attachment
  • attach_unzip: whether to unzip the attachment (default: true)

Fileinput

Information:

  • name: intelmq.bots.collectors.file.collector_file
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect messages from a file.

Configuration Parameters:

  • Feed parameters (see above)
  • path: path to file
  • postfix: FIXME
  • delete_file: whether to delete the file after reading (default: false)

MISP Generic

Information:

  • name: intelmq.bots.collectors.misp.collector
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect messages from a MISP server.

Configuration Parameters:

  • Feed parameters (see above)
  • misp_url: url of MISP server (with trailing '/')
  • misp_key: MISP Authkey
  • misp_verify: (default: true)
  • misp_tag_to_process: MISP tag for events to be processed
  • misp_tag_processed: MISP tag for processed events

Request Tracker

Information:

  • name: intelmq.bots.collectors.rt.collector_rt
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: Request Tracker Collector fetches attachments from an RTIR instance.

Configuration Parameters:

  • Feed parameters (see above)
  • HTTP parameters (see above)
  • uri: url of the REST interface of the RT
  • user: RT username
  • password: RT password
  • search_owner: owner of the ticket to search for (default: nobody)
  • search_queue: queue of the ticket to search for (default: Incident Reports)
  • search_status: status of the ticket to search for (default: new)
  • search_subject_like: part of the subject of the ticket to search for (default: Report)
  • set_status: status to set the ticket to after processing (default: open)
  • take_ticket: whether to take the ticket (default: true)
  • url_regex: regular expression of an URL to search for in the ticket
  • attachment_regex: regular expression of an attachment in the ticket
  • unzip_attachment: whether to unzip a found attachment

The parameter http_timeout_max_tries is of no use in this collector.


XMPP collector

Information:

  • name: intelmq.bots.collectors.xmpp.collector
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: This bot can connect to an XMPP Server and one room, in order to receive reports from it. TLS is used by default. rate_limit is ineffective here. Bot can either pass the body or the whole event.

Configuration Parameters:

  • Feed parameters (see above)
  • xmpp_server: FIXME
  • xmpp_user: FIXME
  • xmpp_password: FIXME
  • xmpp_room: FIXME
  • xmpp_room_nick: FIXME
  • xmpp_room_password: FIXME
  • ca_certs: FIXME (default: /etc/ssl/certs/ca-certificates.crt)
  • strip_message: FIXME (default: true)
  • pass_full_xml: FIXME (default: false)

Alien Vault OTX

See the README.md

Information:

  • name: intelmq.bots.collectors.alienvault_otx.collector
  • lookup: yes
  • public: yes
  • cache (redis db): none
  • description: collect report messages from Alien Vault OTX API

Configuration Parameters:

  • Feed parameters (see above)
  • api_key: location of information resource (e.g. FIXME)

Blueliv Crimeserver

See the README.md

Information:

  • name: intelmq.bots.collectors.blueliv.collector_crimeserver
  • lookup: yes
  • public: no
  • cache (redis db): none
  • description: collect report messages from Blueliv API

Configuration Parameters:

  • Feed parameters (see above)
  • api_key: location of information resource

Microsoft Azure

Iterates over all blobs in all containers in an Azure storage.

Information:

  • name: intelmq.bots.collectors.microsoft.collector_azure
  • lookup: yes
  • public: no
  • cache (redis db): none
  • description: collect blobs from microsoft azure using their library

Configuration Parameters:

  • Feed parameters (see above)
  • account_name: account name as give by Microsoft
  • account_key: account key as give by Microsoft
  • delete: boolean, delete containers and blobs after fetching

N6Stomp

See the README.md

Information:

  • name: intelmq.bots.collectors.n6.collector_stomp
  • lookup: yes
  • public: no
  • cache (redis db): none
  • description: collect report messages from Blueliv API

Configuration Parameters:

  • Feed parameters (see above)
  • exchange: exchange point as given by CERT.pl
  • port: 61614
  • server: hostname e.g. "n6stream.cert.pl"
  • ssl_ca_certificate: path to CA file
  • ssl_client_certificate: path to client cert file
  • ssl_client_certificate_key: path to client cert key file

Parsers

TODO

Generic CSV Parser

Lines starting with '#' will be ignored. Headers won't be interpreted.

Configuration parameters

  • "columns": A list of strings or a string of comma-separated values with field names. The names must match the harmonization's field names. Strings starting with extra. will be written into the Extra-Object of the DHO. E.g. json [ "", "source.fqdn", "extra.http_host_header" ],
  • "column_regex_search": Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaulated using re.search. Eg. to get the ASN out of AS1234 use: {"source.asn": "[0-9]*"}.
  • "default_url_protocol": For URLs you can give a defaut protocol which will be pretended to the data.
  • "delimiter": separation character of the CSV, e.g. ","
  • "skip_header": Boolean, skip the first line of the file, optional. Lines starting with # will be skipped additionally, make sure you do not skip more lines than needed!
  • time_format: Optional. If "timestamp" or "windows_nt" the time will be converted first. With the default null fuzzy time parsing will be used.
  • "type": set the classification.type statically, optional
  • "type_translation": See below, optional
Type translation

If the source does have a field with information for classification.type, but it does not correspond to intelmq's types, you can map them to the correct ones. The type_translation field can hold a JSON field with a dictionary which maps the feed's values to intelmq's.

Experts

Abusix

See the README.md

Information:

  • name: abusix
  • lookup: dns
  • public: yes
  • cache (redis db): 5
  • description: FIXME
  • notes: https://abusix.com/contactdb.html

Configuration Parameters:

FIXME


ASN Lookup

See the README.md

Information:

  • name: ASN lookup
  • lookup: local database
  • public: yes
  • cache (redis db): none
  • description: IP to ASN

Configuration Parameters:

FIXME


Cymru Whois

Information:

  • name: cymru-whois
  • lookup: cymru dns
  • public: yes
  • cache (redis db): 5
  • description: IP to geolocation, ASN, BGP prefix

Configuration Parameters:

FIXME


Deduplicator

See the README.md

Information:

  • name: deduplicator
  • lookup: redis cache
  • public: yes
  • cache (redis db): 6
  • description: message deduplicator

Configuration Parameters:

Please check this README file.


Field Reducer Bot

Information:

  • name: reducer
  • lookup: none
  • public: yes
  • cache (redis db): none
  • description: The field reducer bot is capable of removing fields from events.

Configuration Parameters:

  • type - either "whitelist" or "blacklist"
  • keys - a list of key names (strings)
Whitelist

Only the fields in keys will passed along.

Blacklist

The fields in keys will be removed from events.


Filter

See the README.md

Information:

  • name: filter
  • lookup: none
  • public: yes
  • cache (redis db): none
  • description: filter messages (drop or pass messages) FIXME

Configuration Parameters:

FIXME


Generic DB Lookup

See the README.md


Gethostbyname

Information:

  • name: gethostbyname
  • lookup: dns
  • public: yes
  • cache (redis db): none
  • description: DNS name (FQDN) to IP

Configuration Parameters:

none


IDEA

Information:

  • name: idea
  • lookup: local config
  • public: yes
  • cache (redis db): none
  • description: The bot does a best effort translation of events into the IDEA format.

Configuration Parameters:

  • test_mode: add Test category to mark all outgoing IDEA events as informal (meant to simplify setting up and debugging new IDEA producers) (default: true)

MaxMind GeoIP

See the README.md

Information:

  • name: maxmind-geoip
  • lookup: local database
  • public: yes
  • cache (redis db): none
  • description: IP to geolocation

Configuration Parameters:

FIXME


Modify

Information:

  • name: modify
  • lookup: local config
  • public: yes
  • cache (redis db): none
  • description: modify expert bot allows you to change arbitrary field values of events just using a configuration file

Configuration Parameters:

The modify expert bot allows you to change arbitrary field values of events just using a configuration file. Thus it is possible to adapt certain values or adding new ones only by changing JSON-files without touching the code of many other bots.

The configuration is called modify.conf and looks like this:

[
    {
        "rulename": "Standard Protocols http",
        "if": {
            "source.port": "^(80|443)$"
        },
        "then": {
            "protocol.application": "http"
        }
    },
    {
        "rule": "Spamhaus Cert conficker",
        "if": {
            "malware.name": "^conficker(ab)?$"
        },
        "then": {
            "classification.identifier": "conficker"
        }
    },
    {
        "rule": "bitdefender",
        "if": {
            "malware.name": "bitdefender-(.*)$"
        },
        "then": {
            "malware.name": "{matches[malware.name][1]}"
        }
    },
    {
        "rule": "urlzone",
        "if": {
            "malware.name": "^urlzone2?$"
        },
        "then": {
            "classification.identifier": "urlzone"
        }
    },
    {
        "rule": "default",
        "if": {
            "feed.name": "^Spamhaus Cert$"
        },
        "then": {
            "classification.identifier": "{msg[malware.name]}"
        }
    }
]

In our example above we have five groups labeled Standard Protocols http, Spamhaus Cert conficker, bitdefender, urlzone and default. All sections will be considered, in the given order (from top to bottom).

Each rule consists of conditions and actions. Conditions and actions are dictionaries holding the field names of events and regex-expressions to match values (selection) or set values (action). All matching rules will be applied in the given order. The actions are only performed if all selections apply.

If the value for a condition is an empty string, the bot checks if the field does not exist. This is useful to apply default values for empty fields.

Actions

You can set the value of the field to a string literal or number.

In addition you can use the standard Python string format syntax to access the values from the processed event as msg and the match groups of the conditions as matches, see the bitdefender example above. Note that matches will also contain the match groups from the default conditions if there were any.

Examples

We have an event with feed.name = Spamhaus Cert and malware.name = confickerab. The expert loops over all sections in the file and eventually enters section Spamhaus Cert. First, the default condition is checked, it matches! OK, going on. Otherwise the expert would have selected a different section that has not yet been considered. Now, go through the rules, until we hit the rule conficker. We combine the conditions of this rule with the default conditions, and both rules match! So we can apply the action: classification.identifier is set to conficker, the trivial name.

Assume we have an event with feed.name = Spamhaus Cert and malware.name = feodo. The default condition matches, but no others. So the default action is applied. The value for classification.identifier will be set to feodo by {msg[malware.name]}.

Types

If the rule is a string, a regex-search is performed, also for numeric values (str() is called on them). If the rule is numeric for numeric values, a simple comparison is done. If other types are mixed, a warning will be thrown.


National CERT contact lookup by CERT.AT

Information:

  • name: national_cert_contact_certat
  • lookup: https
  • public: yes
  • cache (redis db): none
  • description: https://contacts.cert.at offers an IP address to national CERT contact (and cc) mapping. See https://contacts.cert.at for more info.

Configuration Parameters:

  • filter: (true/false) act as a a filter for AT.
  • overwrite_cc: set to true if you want to overwrite any potentially existing cc fields in the event.

Reverse DNS

Information:

  • name: reverse-dns
  • lookup: dns
  • public: yes
  • cache (redis db): 8
  • description: IP to domain

Configuration Parameters:

FIXME


RFC1918

Several RFCs define IP addresses and Hostnames (and TLDs) reserved for documentation:

Sources: https://tools.ietf.org/html/rfc1918 https://tools.ietf.org/html/rfc2606 https://tools.ietf.org/html/rfc3849 https://tools.ietf.org/html/rfc4291 https://tools.ietf.org/html/rfc5737 https://en.wikipedia.org/wiki/IPv4

Information:

  • name: rfc1918
  • lookup: none
  • public: yes
  • cache (redis db): none
  • description: removes events or single fields with invalid data

Configuration Parameters:

  • fields: list of fields to look at. e.g. "destination.ip,source.ip,source.url"
  • policy: list of policies, e.g. "del,drop,drop". drop drops the entire event, del removes the field.

RipeNCC Abuse Contact

Information:

  • name: ripencc-abuse-contact
  • lookup: https api
  • public: yes
  • cache (redis db): 9
  • description: IP to abuse contact

Configuration Parameters:

  • query_ripe_db_asn: Query for IPs at http://rest.db.ripe.net/abuse-contact/%s.json, default true
  • query_ripe_db_ip: Query for ASNs at http://rest.db.ripe.net/abuse-contact/as%s.json, default true
  • query_ripe_stat_asn: Query for ASNs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true
  • query_ripe_stat_ip: Query for IPs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true
  • mode: either append (default) or replace

Taxonomy

Information:

  • name: taxonomy
  • lookup: local config
  • public: yes
  • cache (redis db): none
  • description: use eCSIRT taxonomy to classify events (classification type to classification taxonomy)

Configuration Parameters:

FIXME


Tor Nodes

See the README.md

Information:

  • name: tor-nodes
  • lookup: local database
  • public: yes
  • cache (redis db): none
  • description: check if IP is tor node

Configuration Parameters:

FIXME

Url2FQDN

Information:

  • name: url2fqdn
  • lookup: none
  • public: yes
  • cache (redis db): none
  • description: writes domain name from URL to FQDN

Configuration Parameters:

  • overwrite: boolean, replace existing FQDN?

Outputs

File

Information:

  • name: file
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: output messages (reports or events) to file

Configuration Parameters:

  • file: file path of output file

Files

Information:

  • name: files
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: saving of messages as separate files

Configuration Parameters:

  • dir: output directory (default /opt/intelmq/var/lib/bots/files-output/incoming)
  • tmp: temporary directory (must reside on the same filesystem as dir) (default: /opt/intelmq/var/lib/bots/files-output/tmp)
  • suffix: extension of created files (default .json)
  • hierarchical_output: if true, use nested dictionaries; if false, use flat structure with dot separated keys (default)
  • single_key: if none, the whole event is saved (default); otherwise the bot saves only contents of the specified key

MongoDB

Information:

  • name: mongodb
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: MongoDB is the bot responsible to send events to a MongoDB database

Configuration Parameters:

  • collection: MongoDB collection
  • database: MongoDB database
  • db_user : Database user that should be used if you enabled authentication
  • db_pass : Password associated to db_user
  • host: MongoDB host (FQDN or IP)
  • port: MongoDB port
  • hierarchical_output: Boolean (default true) as mongodb does not allow saving keys with dots, we split the dictionary in sub-dictionaries.

Installation Requirements

pip3 install pymongo>=2.7.1

PostgreSQL

Information:

  • name: postgresql
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: PostgreSQL is the bot responsible to send events to a PostgreSQL Database
  • notes: When activating autocommit, transactions are not used: http://initd.org/psycopg/docs/connection.html#connection.autocommit

Configuration Parameters:

The parameters marked with 'PostgreSQL' will be sent to libpq via psycopg2. Check the [libpq parameter documentation] (https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS) for the versions you are using.

  • autocommit: psycopg's autocommit mode, optional, default True
  • connect_timeout: PostgreSQL connect_timeout, optional, default 5 seconds
  • database: PostgreSQL database
  • host: PostgreSQL host
  • port: PostgreSQL port
  • user: PostgreSQL user
  • password: PostgreSQL password
  • sslmode: PostgreSQL sslmode
  • table: name of the database table into which events are to be inserted

Installation Requirements

See REQUIREMENTS.txt from your installation.

PostgreSQL Installation

See outputs/postgresql/README.md from your installation.


REST API

Information:

  • name: restapi
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: REST API is the bot responsible to send events to a REST API listener through POST

Configuration Parameters:

  • auth_token: the user name / http header key
  • auth_token_name: the password / http header value
  • auth_type: one of: "http_basic_auth", "http_header"
  • hierarchical_output: boolean
  • host: destination URL
  • use_json: boolean

SMTP Output Bot

Sends a MIME Multipart message containing the text and the event as CSV for every single event.

Information:

  • name: smtp
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: Sends events via SMTP

Configuration Parameters:

  • fieldnames: a list of field names to be included in the email, comma separated string or list of strings
  • mail_from: string. Supports formatting, see below
  • mail_to: string of email addresses, comma separated. Supports formatting, see below
  • smtp_host: string
  • smtp_password: string or null, Password for authentication on your SMTP server
  • smtp_port: port
  • smtp_username: string or null, Username for authentication on your SMTP server
  • ssl: boolean
  • starttls: boolean
  • subject: string. Supports formatting, see below
  • text: string or null. Supports formatting, see below

For several strings you can use values from the string using the standard Python string format syntax. Access the event's values with {ev[source.ip]} and similar.

Authentication is optional. If both username and password are given, these mechanism are tried: CRAM-MD5, PLAIN, and LOGIN.

Client certificates are not supported. If http_verify_cert is true, TLS certificates are checked.


TCP

Information:

  • name: tcp
  • lookup: no
  • public: yes
  • cache (redis db): none
  • description: TCP is the bot responsible to send events to a tcp port (Splunk, ElasticSearch, etc..)

Configuration Parameters:

  • ip: IP of destination server
  • hierarchical_output: true for a nested JSON, false for a flat JSON.
  • port: port of destination server
  • separator: separator of messages