Tuesday, April 14, 2020

Fixing the Splunk TA-pfsense stanza for sourcetype extraction

I'm not sure if its a setting due to me, be it the way my pfsense is setup or some other PEBCAK (Problem exists between chair and keyboard) issue but in the pursuit of maximum value extracted from pfsense logs I dare not tackle it myself, but taste test from people far smarter than me.

There are a few good sites to help explain what is found in pfsense filter logs:
Here is Netgates' guide:

Here is an attempt by a regex ninja at extracting fields:


I work with Splunk and like to use it whenever possible to keep my practice up.  There are a number of Splunk apps that tackle pfsense logs:

- homemonitor
- A3sec
- TA-pfsense
- Technology add-on for pfSense filterlog (looking at the download count and last date updated I passed over testing this add-on).

Looking 'under the hood', homemonitor is a one-size-fits-all app.  Its simply amazing- the installation GUI, the attempt to make it as comparable as possible across a broad range of brands of gear, the thought put into the various dashboards and panels.  If you are not running pfsense but some other consumer router, highly highly recommend running this app.

That said, it can only commit so much effort into extraction of the pfsense logs.  So onto another taste testing.

The TA and APP for pfSense by A3Sec is another full featured app.   It has lookup tables, tags, all kinds of higher level Splunking to make sense of and even add value to data.  But when I would look at the raw log streams (be it the pfsense webUI or clog -f a log in the  terminal) I was finding discrepancies.  I believe what has happened is as pfsense updates or changes log format slightly, it throws off the extractions.  Or again PEBCAK and I'm failing to see (but its likely) that A3Sec made a clever way of cranking down the noise and showing events that have more value/importance.  

But while looking into A3Sec app I found other posts online of people trying their hand at regex trying to extract every field possible, from event logs that can end up being very different in format due to the source of the log coming from IPv4 or IPv6 traffic, of OpenVPN traffic, webUI logs etc.

Add to the hair pulling, adding further data sources such as Snort or even Windows logs from one of my local machines would cause splunk to simply just stop ingesting.  What I found in this case is that a lot of these apps are not explicitly calling out a method of time extraction so Splunk uses it's default to attempt.  I theorize that when you have an assortment of apps and data feeds coming in where this is not specified, it breaks my free-license splunk instance's ability to field extract.  When investigating the internal logs you will see line break errors due to size, and some bogus dates being extracted, this was my sign.

So going over to this thread, someone provides a nice little regex to get pfsense time extracted:

With that set, you can have Snort logs, pfSense logs and other logs coming in.  For Snort I used a splunk stanza to simply put for time stamp the current time the data get ingested.  For line breaking (as many apps use time stamps as a method of knowing the end of one event and the beginning of a new log event) ensure the app that you use this time setting on has a stanza in the props that performs the line breaking.

Splunk configure timestamp recognition:

''' code test '''

So with different data inputs working at the same time, back to testing various pfSense extractions.  Onto the TA-pfsense add-on.

As a add-on, its simply (but not simple in-of-itself) the field extraction element and you are required to make dashboards.  I found porting over A3Sec's dashboards a great start.

Prepare yourself--- the extractions are amazing.  I was in awe what was coming in and getting caught.  Just wow.  A big part of this effectiveness is a great example of using a transforms.conf to look at the incoming data and with some regex output that data one sourcetype or another so that the subsequent props.conf stanzas can apply effective regex for the log format.

See example of this transforms using regex to get past the time data stamp, find pfsense and output the word right after that to give it a source type (sourcetyper).



And this is where I found issue.  In Splunk there are logs coming in without the time stamp in the raw, so it brakes TA-pfSense's ability to apply a sourcetype other than the default "pfsense", so something that should be sourcetype "pfsense:openvpn" and then accurately field extracted is instead coming out simply sourcetype "pfsense" and the openvpn props.conf field extractions are not applied.

When SSH'ing into the pfsense box and clog -f (linux's tail -f, pfsense is FreeBSD) the openvpn logs, there are dates and times.  Same in pfsense's WebUI log viewer.  

WHAT THE HECK?!

Example of raw and splunk, good sourcetype extraction



Example of raw without stamp, so sourcetyped only as "pfsense" but terminal has normal pfsense time format:


Issue with sourcetyping:

This was on a Splunk instance that also had the A3sec app installed, so just to be sure there wasn't some issue I was not finding with app interaction, little time extraction changes I had put in etc I built up a vanilla splunk server just for the TA-pfSense add-on.  There was some delay doing this as I was also moving over from ESXi free to XCP-ng in an attempt to get more hypervisor features.  Thats a whole nother blog post (and learning curve).

So now with a fresh vanilla CentOS7 box with Splunk 7.x installed, off we go.  Same issue with a vanilla setup (time extraction from that earlier splunk post not added, this is using all default).

Whats interesting is the lack of time in the raw logs in Splunk itself.  One thing to do is to spin up yet another server, A3Sec app and see if those same log types still come in the raw like that.  Another observation I had was the A3sec app's openvpn dashboard used to work, but seemed to stop.  I really suspect pfSense's syslog output to have possible changed.  Heck maybe a bug has been found?  Its not streaming out as its displayed in the terminal.

If I can't get pfSense to output correctly, or if this is an "at ingestion" issue, a band-aid might be to completely migrate away from the current transforms.conf method and use some other regex that just finds the first reference of pfsense, takes the word after the next dead space and use that for source typing.  I'm very weak on regex, but this is a reason to try.

UPDATE 7/18/2020
With fresh eyes I looked at apps/TA-pfsense/default/props.conf

I saw right at the top:
 SEDCMD-event_cleaner = s/^(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+\S+\.\S+\s+/\1/g
 SEDCMD-event_cleaner2 = s/^(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+/\1/g
 SEDCMD-event_cleaner3 = s/^\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s\S+\s(\S+\s)/\1/g

So has this dropping of the time stamps of certain source types (and some of the message itself) been a function?  Intentional to reduce sourcetype noise?  Not sure as the props has extractions for openvpn but those were not working because openvpn logs were not getting sourcetyped.

I commented them out and bam, the types of sourcetypes filling the splunk index started to increase in diversity.  Not sure if there is a negative long term effect with this, I wish the dev was reachable to ask.  

No comments:

Post a Comment