fluentd の変更点

追加された行はこの色です。
削除された行はこの色です。
fluentd へ行く。
fluentd の差分を削除
#author("2020-09-24T01:56:22+00:00","default:wikiadmin","wikiadmin")
#author("2020-09-29T02:56:23+00:00","default:wikiadmin","wikiadmin")
-ログ収集ツール。単体でつかうよりもパッケージされたtd-agentで利用することが多い。

#contents

*td-agentのインストール [#k58a704b]

-AmazonLinuxなら楽勝。

http://dev.classmethod.jp/cloud/td-agent2-amazon-linux/

上記ページに沿って行った。CentOSでも同じ手順。

**最小限の設定 [#w61977e4]

 # これでポート24224で待ち受ける
 <source>
  type forward
  port 24224
 </source>
 # filterかけて
 <filter **> 
  type script
  path /etc/td-agent/example.rb
 </filter>
 # 全部をファイルに出す。もしタグでマッチさせたいなら**をapache.tagなどにすれば良い
 <match **>
  type file
  path /var/log/td-agent/99test
 </match>



**設定ファイルのチェック [#gcf25c54]

|CentOS7|td-agent --dry-run|

手動で実行するならば以下の通りに実行する

*プラグイン管理 td-agent-gem [#wcd9b241]

-基本的にはgemと一緒だが、td-agentだけに閉じたgem

|td-agent-gem list|インストール済みプラグイン一覧|
|td-agent-gem install fluent-plugin-dstat|fluent-plugin-dstatのインストール|

**各種プラグイン解説 [#te887663]

***fluent-plugin-script [#v8354464]

rubyのスクリプトを記述できる
rubyのスクリプトを記述できる。filterとして設定する

 <filter from.java.user>
  type script
  path /etc/td-agent/example.rb
 </filter>

-example.rb

 require "json"
 def start
   super
 end
 def shutdown
   super
 end
 def filter(tag, time, record)
  # This method implements the filtering logic for individual filters
  decoded = JSON.parse(record["response"])
  record["add"] = decoded["opponents"]
  record.delete["response"]
  record.delete["country"]
  record.delete["age"]
  record
 end


*td-agent on Mac [#g7c94cc0]

+dmgファイルをダウンロードしてインストール
+以下のコマンドで起動停止

 sudo launchctl load /Library/LaunchDaemons/td-agent.plist
 sudo launchctl unload /Library/LaunchDaemons/td-agent.plist

**td-agentのバージョン整理 [#u766eabc]

-2015/10時点の調査記録。yumは基本的にどんどん新しいものに更新されていく。

|2015/10|OS|td-agent|fluentd|
|2015/10|CentOS6|0.12.12||
|2015/10|CentOS7|0.12.12||
|2015/10|AmazonLinux 15.09|0.12.12||
|2016/01|Azure CentOS7|0.12.19||


**バージョンのなぞ [#g882c6ca]

同じリポジトリを使っていて同じrpmのバージョンなのに0.10.60と0.10.55が混在している。

 $ td-agent --version
 td-agent 0.10.55
 $ rpm -qi td-agent
 Name        : td-agent                     Relocations: (not relocatable)
 Version     : 1.1.21                            Vendor: Treasure Data, Inc.
 Release     : 0                             Build Date: 2014年10月20日 17時31分13秒
 Install Date: 2015年08月12日 14時02分46秒      Build Host: ip-10-123-31-198.ec2.internal
 Group       : System Environment/Daemons    Source RPM: td-agent-1.1.21-0.src.rpm
 Size        : 103551538                        License: APL2
 Signature   : DSA/SHA1, 2014年10月20日 22時07分39秒, Key ID 1093db45a12e206f
 URL         : http://treasure-data.com/
 Summary     : td-agent
 Description :



**yum.repositoryの追加 [#ne69bde8]

- vi /etc/yum.repos.d/td.repo

 [treasuredata]
 name=TreasureData
 baseurl=http://packages.treasure-data.com/redhat/$basearch
 gpgcheck=0

-V2を入れる場合（最近はデフォルトこちら)

 [treasuredata]
 name=TreasureData
 baseurl=http://packages.treasuredata.com/2/redhat/$releasever/$basearch
 gpgcheck=1
 gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent

**td-agentのバージョンアップ [#k1a60d95]

 yum remove td-agentxxx
 curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

-プラグインは入れなおしとなる。

**td-agentのインストール [#z0dbacbe]

 yum install td-agent

*プラグイン [#dca18629]

**プラグインの場所 [#c4f96ad1]

|td-agent 0.10.55(32bit)|/usr/lib/fluent/ruby/lib/ruby/gems/1.9.1/gems|
|td-agent 0.10.55(64bit)|/usr/lib64/fluent/ruby/lib/ruby/gems/1.9.1/gems|
|td-agent 0.12.12|/opt/td-agent/embedded/lib/ruby/gems/2.1.0/|

**プラグインのgemインストール [#rbd89c52]

上記場所にあるfluent-gemを利用する。td-agentが管理するrubyでインストールする必要がある。fluent-gemの場所がOS&バージョンにより違うので注意

|Redhat5|/usr/lib/fluent/ruby/bin/fluent-gem|
|RedHat6|/opt/td-agent/embedded/bin/fluent-gem|
|AmazonLinux|/usr/lib64/fluent/ruby/bin/fluent-gem|

+/usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-zabbix
+/opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-forest
+/usr/lib/fluent/ruby/bin/fluent-gem install fluent-plugin-record-reformer

**直接配置する場合 [#l6ef2db1]

 /etc/td-agent/plugin/in_xxx.rb or out_xxx.rb


*設定ファイル [#wb9dd2c6]

sourceで入力を定義して、matchで処理を行う。matchで複数の処理はできないので別々のプラグインで複数処理をしたい場合はtagをつける。

 <match apache.access>
   type file
   path /var/tmp/apache_all.log
   # ワイルドカードを使う場合は"で囲む！
   path "/var/tmp/*_access_log"
 
   tag next.apache.access
 </match>
 <match next.apache.access>
   type file
   path /var/tmp/apache_all2.log
 </match>

 
**設定ファイルのインクルード [#y077ffd5]

 @include conf.d/*.conf

**設定ファイルで環境変数を使う [#xa193d8c]

+引数で--use-v1-configが必須。/etc/init.d/td-agentにて付与する（V2からは標準だが、付けているようだ）
+/etc/sysconfig/td-agentなどで変数を設定する

 <source>
   type tail
   tag var.tmp
   path "/var/tmp/#{ENV['TD_HOSTNAME']}"
   format none
 </source>

ダブルクォーテーションで囲まないと展開されないので注意。
matchには使えない・・・・これでは意味ない
"#{Socket.gethostname}"でホスト名利用可能

 <match raw.dummy>
   type file
   path "/var/tmp/#{ENV['HOME']}/test.log"
 </match>

インクルードには使えた。

 @include "#{ENV['TD_HOSTNAME']}.conf"

HOMEはtd-agentの実行ユーザーのディレクトリとなる。デフォルトでは/var/lib/td-agent/

**変数のタイプを設定 [#w4c4998b]

types size:integer,response_time:integer

**httpポート8888で待ち受け [#g838a180]

 # http://localhost:8888/<tag>?json=<json>
 <source>
   type http
   port 8888
 </source>

type forwardの場合はhttpアクセスはできないがそのポートで待ち受けすることになる。


**tagやフィールドにhost名を自動付与する。 [#n43f4895]

http://www.fluentd.org/guides/recipes/apache-add-hostname

***フィールドに付与する場合はfilterタグを使うのが良い [#q4fee96e]

 <filter web.*>
   type record_transformer
   <record>
     service_name ${tag_parts[1]}
   </record>
 </filter>

-除去したい場合も。excludeは複数並べることができる。

 <filter apache.access>
   type grep
   exclude1 statuscode (200|301|302|304)
 </filter>


**設定をDSLで記述する [#ree036c5]

-/usr/sbin/td-agentの読み込み設定ファイルを.rbに変更して以下の記載をする。

 #!/opt/td-agent/embedded/bin/ruby
 ENV["GEM_HOME"]="/opt/td-agent/embedded/lib/ruby/gems/2.1.0/"
 ENV["GEM_PATH"]="/opt/td-agent/embedded/lib/ruby/gems/2.1.0/"
 #ENV["FLUENT_CONF"]="/etc/td-agent/td-agent.conf"
 ENV["FLUENT_CONF"]="/etc/td-agent/test.rb"
 ENV["FLUENT_PLUGIN"]="/etc/td-agent/plugin"
 ENV["FLUENT_SOCKET"]="/var/run/td-agent/td-agent.sock"
 load "/opt/td-agent/embedded/bin/fluentd"

-元のファイルは以下の通り配列をループさせている。type以降の設定値は""で加工必要がある。

 ['hoge','fuga'].each do |i|
   match ("foo#{i}.#{ENV['HOSTNAME']}") {
     type :stdout
   }
 end
 source {
   type :tail
   path "/var/tmp/hoge.log"
 }
 
 # apche settingをＤＳＬで記載してみた
 apache_hash = { "access" => "apache", "error" => "apache_error"}
 apache_hash.each do |key,value|
 source {
     type :tail
     path "/var/log/httpd/*_#{key}_log"
     format "#{value}"
     tag "apache_#{key}"
     pos_file "/tmp/td-agent/apache_#{key}.pos"
 }
 end


-設定確認

 td-agent -c /etc/td-agent/test.rb --dry-run


-/var/log/td-agent/td-agent.logにxml形式で展開される

  <match foohoge.**>
    type stdout
  </match>
  <match foofuga.**>
    type stdout
  </match>

**type(subtype)の説明 [#y81ab9e0]

|type名|簡単な概要|
|null|転送せずに捨てる|
|forest|タグ名を置換変数化できるので、まとめて同じような設定をしたいときに使う|
|rewrite_tag_filter|正規表現でタグづけできる|
|record_modifier|新たに属性を追加できる。たとえばApacheログにホスト名を付与したりとか|

**ローカルのファイルを転送する。 [#ed3e1947]

 <source>
   type tail
   format apache
   path /var/log/httpd/*_access_log
   tag apache.access
   pos_file /tmp/fluentd-apache.pos
 </source>
 <match apache.access>
        type s3
        aws_key_id 
        aws_sec_key 
        s3_bucket bucket_name
        s3_endpoint bucket_name.s3-website-ap-northeast-1.amazonaws.com
        path logs/
        buffer_path /var/tmp/fluentd
        time_slice_format %Y%m%d/%H_apache.log
        time_slice_wait 30m
        flush_interval 60s // この感覚でS3にputするので一日1440リクエストで危うくクラウド破産！
 </match>
 <source>
   type   tail
   path   /var/log/httpd/error_log
   format apache_error
   tag    apache.error
   pos_file /tmp/apache_error.pos
 </source>
 # 送り先を Fluentd の標準ログへ出力します
 <match apache.error>
   type stdout
 </match>


 <source>
   type tail
   path /var/log/httpd/access_log
   pos_file /var/tmp/access_log.pos
   tag httpd
   format none
 </source>
 # 送り先を Fluentd の標準ログへ出力します
 <match httpd>
   type stdout
 </match>

*プラグイン [#xe30e015]

**プラグイン一覧 [#tbbe9a57]

|プラグイン名||
|copy|転送やファイル保存など複数に保存したいときに|
|rewrite_tag_filter|条件に応じてタグを書き換えることができる|
|forest|同じようなタグに一括で適用したい場合に非常に便利|
|fluent-plugin-map|レコードの内容書き換え|
|fluent-plugin-record-reformer|同じくレコード書き換え|


**日付付きファイル名に対応させる [#jcefe66d]

 <source>
   type tail
   format none
   path /var/tmp/%Y%m%d%H.log
   tag tail_ex_test
   pos_file /tmp/td-agent/tail_ex_test.pos
   refresh_interval 10
 </source>

日付のフォーマットはrubyのもの参照！
上記の例だと2015083122.logが監視ファイル名となる。


*format [#e5634f6d]

**主要フォーマット [#i61dd916]

|フォーマット名|入力文字例|備考|
|none|入力そのまま||
|none_with_hostname||入力文字列にhost情報|
|ltsv|domain:example.com|ラベル付きのTSV|
|apache2|apacheのcombined|カスタマイズしてたらNG|
|apache.error|apacheのerrorログ|カスタマイズしてたらNG|
|csv,tsv|example.com,/hoge|keys domain,pathなどとキーを別個定義|

**フィルタリング正規表現 [#n3ad45ed]

formatを自分で作る場合rubyの正規表現の知識が必須。

***Apacheの場合(combined以外) [#d9dc7d93]

日付の部分の正規表現がとてもめんどくさい。\[(?<time>[^\]]+)\]がその正規表現。フォーマットも指定しないとだめ。

 format /^(?<host>[^ ]+) [^ ]+ [^ ]+ \[(?<time>[^\]]+)\] (?<message>[^ ]+).*$/
 time_format %d/%b/%Y:%T %z 

***参考サイト [#g12c4414]

http://diary.tachibanakikaku.com/2013/12/fluentdformat.html

***手元で正規表現テスト [#jba8ae91]

 #!/usr/bin/env ruby
 # -*- coding: utf-8 -*-
 require 'time'
 require 'fluent/log'
 require 'fluent/config'
 require 'fluent/engine'
 require 'fluent/parser'
 $log ||= Fluent::Log.new
 # debug
 log = ''
 format = //
 time_format = ''
 parser = Fluent::TextParser::RegexpParser.new(format, 'time_format' => time_format)
 puts parser.call(log)

 /usr/lib64/fluent/ruby/bin/ruby fluenttest.ruby
 # amazonLinuxだとrubyのパスが違う
 /opt/td-agent/embedded/bin/ruby fluenttest.ruby

***テスト実行サイト [#xb89bdf4]

Fluentular: a Fluentd regular expression editor
http://fluentular.herokuapp.com/

*実行 [#eedf55e8]

**トラブルシューティング [#u0d6c6ea]

+読み込みファイルの指定にワイルドカードが使えないわかがない！→後で修正
+読み込みにはtd-agentグループ権限が付与されていないとエラー
+combinedがパターンマッチされない・・これはカスタマイズしている可能性もあるので今後調査。→カスタマイズしてたら取り込まれない！

*Tips [#k3e8b6af]

-secure messageの取り込み

http://y-ken.hatenablog.com/entry/fluentd-syslog-permission


**td-agentのログの再取り込み [#y432f503]

-そのまま取り込めそうなものだがJSONに組み替えてあげないとだめ。

 cut -f1,3 fluent_test.log | awk -F'\t' '{print "{\"timestamp\":\"" $1 "\","  substr($2,2)}'

-取り込みの設定も下記のように細かく記載する。time_keyとtime_formatを指定しないと取り込み時間がログの記録時間になってしまう。

 <source>
   type tail
   tag recover
   path /var/tmp/recover.log
   format json
   time_key timestamp
   time_format "%Y-%m-%dT%H:%M:%S%z"
 </source>


**既存ログの取り込み [#id232f2d]

posファイルを変更してもダメ！tailプラグインしかないのがイタイ。
結局ファイルを上書きすることで解決だが、一気に読み込むため以下のエラーが出てしまう。

 2015-09-01 16:28:23 +0900 [warn]: Size of the emitted data exceeds buffer_chunk_limit.
 2015-09-01 16:28:23 +0900 [warn]: This may occur problems in the output plugins ``at this server.``
 2015-09-01 16:28:23 +0900 [warn]: To avoid problems, set a smaller number to the buffer_chunk_limit
 2015-09-01 16:28:23 +0900 [warn]: in the forward output ``at the log forwarding server.``

outputのbuffer_chunk_limitを100Mにしたら、エラーは消えた。


emblukという新しいソリューションが出ているので今後はそちらに期待。


*filter [#y43ea9cb]

最近のバージョンではこちらを使う。

**設定 [#f1e67174]

-matchの前に置くべし！

 <filter foo.bar>
  type grep
  regexp1 message cool
  regexp2 hostname ^web\d+\.example\.com$
  exclude1 message uncool
</filter>


複数条件がある場合
regexpの場合はAND条件になり、excludeの場合はor条件になる。基本的にexcludeで使っていくべきだろう。

*rewrite [#c4c489bb]

いまいち使えないのでFilterを検討する！

**インストール [#vc73ed27]

 /usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-rewrite

**設定 [#e12ce5b8]

 <match test.log>
   type rewrite
   remove_prefix test
   add_prefix reformed
   <rule>
     key message
     pattern hoge
     replace fuga
   </rule>
 </match>

*Filter [#m3c9cff1]

-v0.12から利用可能。AWSならOKだが、CentOS系はV0.10だから使えない。

**Filter設定例 [#o7082556]

 <source>
   type dummy
   tag raw.dummy
   dummy {"message":"[WARN] warning[tab]message[tab]"}
 </source>
 <filter raw.**>
   type grep
   regexp1 message WARN
 </filter>
 <filter raw.**>
   type record_transformer
   enable_ruby true
   <record>
     tag ${tag}
     hostname "#{Socket.gethostname}"
     replaced ${message.gsub(/tab/,'\t')}
   </record>
 </filter>
 <match raw.**>
   type stdout
 </match>