5.3. Logs User Guide

5.3.1. Prerequisites

  • Require 3 VMs to setup K8s

  • $ sudo yum install ansible

  • $ pip install openshift pyyaml kubernetes (required for ansible K8s module)

  • Update IPs in all these files (if changed)
    Path Description
    ansible-server/group_vars/all.yml IP of K8s apiserver and VM hostname
    ansible-server/hosts IP of VMs to install
    ansible-server/roles/logging/files/persistentVolume.yaml IP of NFS-Server
    ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml IP of alert-receiver

5.3.2. Architecture

../../_images/setup.png

5.3.3. Installation - Clientside

5.3.3.1. Nodes

  • Node1 = 10.10.120.21
  • Node4 = 10.10.120.24

5.3.3.2. How installation is done?

  • TD-agent installation
    $ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh
  • Copy the TD-agent config file in Node1
    $ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf
  • Copy the TD-agent config file in Node4
    $ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf
  • Restart the service
    $ sudo service td-agent restart

5.3.4. Installation - Serverside

5.3.4.1. Nodes

Inside Jumphost - POD12
  • VM1 = 10.10.120.211
  • VM2 = 10.10.120.203
  • VM3 = 10.10.120.204

5.3.4.2. How installation is done?

Using Ansible:
  • K8s
    • Elasticsearch: 1 Master & 1 Data node at each VM
    • Kibana: 1 Replicas
    • Nginx: 2 Replicas
    • Fluentd: 2 Replicas
    • Elastalert: 1 Replica (get duplicate alert, if increase replica)
  • NFS Server: at each VM to store elasticsearch data at following path
    • /srv/nfs/master
    • /srv/nfs/data

5.3.4.3. How to setup?

  • To setup K8s cluster and EFK: Run the ansible-playbook ansible/playbooks/setup.yaml
  • To clean everything: Run the ansible-playbook ansible/playbooks/clean.yaml

5.3.4.4. Do we have HA?

Yes

5.3.5. Configuration

5.3.5.1. K8s

5.3.5.1.1. Path of all yamls (Serverside)

ansible-server/roles/logging/files/

5.3.5.1.2. K8s namespace

logging

5.3.5.1.3. K8s Service details

$ kubectl get svc -n logging

5.3.5.2. Elasticsearch Configuration

5.3.5.2.1. Elasticsearch Setup Structure

../../_images/elasticsearch.png

5.3.5.2.2. Elasticsearch service details

Service Name: logging-es-http
Service Port: 9200
Service Type: ClusterIP

5.3.5.2.3. How to get elasticsearch default username & password?

  • User1 (custom user):
    Username: elasticsearch
    Password: password123
  • User2 (by default created by Elastic Operator):
    Username: elastic
    To get default password:
    $ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
    $ echo $PASSWORD

5.3.5.2.4. How to increase replica of any index?

$ curl -k -u “elasticsearch:password123” -H ‘Content-Type: application/json’ -XPUT “https://10.10.120.211:9200/indexname*/_settings” -d ‘
{
“index” : {
“number_of_replicas” : “2” }
}’

5.3.5.2.5. Index Life

30 Days

5.3.5.3. Kibana Configuration

5.3.5.3.1. Kibana Service details

Service Name: logging-kb-http
Service Port: 5601
Service Type: ClusterIP

5.3.5.4. Nginx Configuration

5.3.5.4.1. IP

The IP address with https. Ex: “10.10.120.211:32000”

5.3.5.4.2. Nginx Setup Structure

../../_images/nginx.png

5.3.5.4.3. Ngnix Service details

Service Name: nginx
Service Port: 32000
Service Type: NodePort

5.3.5.4.4. Why NGINX is used?

Securing ELK using Nginx

5.3.5.4.5. Nginx Configuration

Path: ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml

5.3.5.5. Fluentd Configuration - Clientside (Td-agent)

5.3.5.5.1. Fluentd Setup Structure

../../_images/fluentd-cs.png

5.3.5.5.2. Log collection paths

  • /tmp/result*/*.log
  • /tmp/result*/*.dat
  • /tmp/result*/*.csv
  • /tmp/result*/stc-liveresults.dat.*
  • /var/log/userspace*.log
  • /var/log/sriovdp/*.log.*
  • /var/log/pods/**/*.log

5.3.5.5.3. Logs sent to

Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost.

5.3.5.5.4. Td-agent logs

Path of td-agent logs: /var/log/td-agent/td-agent.log

5.3.5.5.5. Td-agent configuration

Path of conf file: /etc/td-agent/td-agent.conf
If any changes is made in td-agent.conf then restart the td-agent service, $ sudo service td-agent restart

5.3.5.5.6. Config Description

  • Get the logs from collection path

  • Convert to this format
    {
    msg: “log line”
    log_path: “/file/path”
    file: “file.name”
    host: “pod12-node4”
    }
  • Sends it to fluentd

5.3.5.6. Fluentd Configuration - Serverside

5.3.5.6.1. Fluentd Setup Structure

../../_images/fluentd-ss.png

5.3.5.6.2. Fluentd Service details

Service Name: fluentd
Service Port: 32224
Service Type: NodePort

5.3.5.6.3. Logs sent to

Elasticsearch service (Example: logging-es-http at port 9200)

5.3.5.6.4. Config Description

  • Step 1
    • Get the logs from Node1 & Node4
  • Step 2
    log_path add tag (for routing)
    /tmp/result.*/.*errors.dat errordat.log
    /tmp/result.*/.*counts.dat countdat.log
    /tmp/result.*/stc-liveresults.dat.tx stcdattx.log
    /tmp/result.*/stc-liveresults.dat.rx stcdatrx.log
    /tmp/result.*/.*Statistics.csv ixia.log
    /tmp/result.*/vsperf-overall* vsperf.log
    /tmp/result.*/vswitchd* vswitchd.log
    /var/log/userspace* userspace.log
    /var/log/sriovdp* sriovdp.log
    /var/log/pods* pods.log
  • Step 3
    Then parse each type using tags.
    • error.conf: to find any error
    • time-series.conf: to parse time series data
    • time-analysis.conf: to calculate time analyasis
  • Step 4
    host add tag (for routing)
    pod12-node4 node4
    worker node1
  • Step 5
    Tag elasticsearch
    node4 index “node4*”
    node1 index “node1*”

5.3.6. Elastalert

5.3.6.1. Send alert if

  • Blacklist
    • “Failed to run test”
    • “Failed to execute in ‘30’ seconds”
    • “(‘Result’, ‘Failed’)”
    • “could not open socket: connection refused”
    • “Input/output error”
    • “dpdk|ERR|EAL: Error - exiting with code: 1”
    • “Failed to execute in ‘30’ seconds”
    • “dpdk|ERR|EAL: Driver cannot attach the device”
    • “dpdk|EMER|Cannot create lock on”
    • “dpdk|ERR|VHOST_CONFIG: * device not found”
  • Time
    • vswitch_duration > 3 sec

5.3.6.2. How to configure alert?

  • Add your rule in ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml (Elastalert Rule Config)
    name: anything
    type: <check-above-link> #The RuleType to use
    index: node4* #index name
    realert:
    minutes: 0 #to get alert for all cases after each interval
    alert: post #To send alert as HTTP POST
    http_post_url: # Provide URL
  • Mount this file to elastalert pod in ansible/roles/logging/files/elastalert/elastalert.yaml.

5.3.6.3. Alert Format

{“type”: “pattern-match”, “label”: “failed”, “index”: “node4-20200815”, “log”: “error-log-line”, “log-path”: “/tmp/result/file.log”, “reson”: “error-message” }

5.3.7. Data Management

5.3.7.1. Elasticsearch

5.3.7.1.1. Q&As

Where data is stored now? Data is stored in NFS server with 1 replica of each index (default). Path of data are following:

  • /srv/nfs/data (VM1)
  • /srv/nfs/data (VM2)
  • /srv/nfs/data (VM3)
  • /srv/nfs/master (VM1)
  • /srv/nfs/master (VM2)
  • /srv/nfs/master (VM3)

If user wants to change from NFS to local storage, can he do it? Yes, user can do this, need to configure persistent volume. (ansible-server/roles/logging/files/persistentVolume.yaml)

Do we have backup of data? Yes. 1 replica of each index

When K8s restart, the data is still accessible? Yes (If data is not deleted from /srv/nfs/data)

5.3.8. Troubleshooting

5.3.8.1. If no logs receiving in Elasticsearch

  • Check IP & port of server-fluentd in client config.
  • Check client-fluentd logs, $ sudo tail -f /var/log/td-agent/td-agent.log
  • Check server-fluentd logs, $ sudo kubectl logs -n logging <fluentd-pod-name>

5.3.8.2. If no notification received

  • Search your “log” in Elasticsearch.
  • Check config of elastalert
  • Check IP of alert-receiver