Friday, April 24, 2020

fuse file system join several file systems together to form a larger single one

I had a unique situation of wanting a larger jbod but on a Pi, and you shouldn't use LVM to accomplish that and RAID on a pi3 is not really an option.  Found mhddfs.

https://romanrm.net/mhddfs

Monday, April 20, 2020

BASH commands explained

Have to save a place for a really cool website that explains the syntax in a bash command.  A great example:

https://explainshell.com/explain?cmd=rsync+-avz+--progress+--partial+-e

Just wow.

Sunday, April 19, 2020

Raspberry Pi 3 as an rsync backup server

So in my other post I started down the road of editing /conf/etc files in FreeNAS in the pursuit of getting SMB logs sent via syslog-ng.  This is a scary endever, a bricked FreeNAS and I loose... all of my data including VM storage via NFS.

I've had a Raspberry Pi 3 sitting on my table for what feels like a year now.  It's only good use-case for me was to be a hass.io home-assistant server, but I have since moved that to a VM.

Though a zfs replication backup would be the ideal setup, I do not have the hardware (nor want to pay more on my powerbill) to spin up another ZFS built just for this backup.

Enter the Raspberry Pi 3 and a big external HDD.

I searched around for a good tutorial to hold my hand along the way as I want something to refresh my memory to:

- new Pi headless on an SD card
  -  Used a GUI disk manager to get the old SD card back into one volume
- Not only setup SSH, but with a key (not password setup)
- Setup FreeNAS account that can do this backup, but only this backup
- not just mnt but FSTAB the external HDD into the Pi
- First time playing with rsync, set it up to get all of my SMB shares and VM NFS
-

First things first, install Raspberry Lite (no need for GUI) but in typical fashion, off to a broken start.  Etcher was showing an error, file mismatch (what was burned vs. source image).  Googling later found this:
https://forums.balena.io/t/checksums-do-not-match/36537/48

Which said to go to this post:
https://superuser.com/questions/1199823/how-to-prevent-creation-of-system-volume-information-folder-in-windows-10-for/1199824#1199824

To go into local policy (or if you can't, registry edit) to change "Do not allow locations on removable drives to be added to libraries".  This still didn't fix the error.  If you have a linux laptop handy, its a good use-case to get this done.  Or before blowing the dust off the *nix box I tried to use Raspberry's very own Imager v1.2.  That worked.

Next, setup SSH- the smart thing is to make the folder before booting, but if you already booted up you need a screen and keyboard to setup SSH.  The goal here is to do everything headless (hence using MAC to figure out IP via ARP on your router)

https://www.raspberrypi.org/documentation/remote-access/ssh/README.md#3-enable-ssh-on-a-headless-raspberry-pi-add-file-to-sd-card-on-another-machine

Taken from memory from a person's headless server build, you can grab memory from the GPU for the system, since this headless lite build will not output to a monitor or have a GUI.

https://www.raspberrypi.org/forums/viewtopic.php?t=181262

According to that thread, 16mb is fine.  This can be done in the "sudo raspi-config"

Confirmed by https://www.pestmeester.nl/   :  A very dated tutorial but one of the first I ever used for my pi and amazingly written to setup a pi LAMP stack with security in mind.  Sadly its dated and a lot of the packages and even some of the commands are now gone/not functional.

Apparently Windows 10 just touching the SD adds a file.  Not good for this situation.  Strangely not an issue when I create Linus bootable USBs, but I find anything non x64 Linux to be a pain like this- be in freeBSD or ARM, there is always some nuance.

My ARP table is lame and not showing a hostname, so this was a useful post to get the MAC range of Pis
https://raspberrypi.stackexchange.com/questions/28365/what-are-the-possible-ouis-for-the-ethernet-mac-address

Now do your standard apt-get update and apt-get upgrade -y.

Also passwd to change the default password.  Its a good time to make a different user too specifically for the backup job.

----  Finally Pi is setup, time to attack the list to make it an rsync server -----

A very very good place to pickup after getting Raspberry Lite installed is the Linux Automated Backup series by carpie.net on Youtube:

https://www.youtube.com/watch?v=iPK1TYOwzXI

Command notes:
- dmesg to get drive info (sda)

This gets blurry for me, how is the e1xternal drive sda and not the SD card?  And in his vid he will label the drive sda1, but if you use blkid, you will see sda1 is already the SD card, sda2 is the external drive.

- blkid to get more detailed info (/dev/sda2:  LABEL="Seagate Backup Plus Drive"  UUID"blah blah"

-  I'm going to stick with sda2.

- I was hard pressed, do I keep the ntfs format, do I format to ext4?  If I keep NTFS then in theory I can connect to windows, but if the files within are compressed by rsync, is there any advantage?  Maybe better to use a linux native format like ext4 because reasons.  Then I found this guy's blog and for sure will ditch ntfs, but has me doubting ext4 too!  I have lots of files that likely have the bad character issue.

http://therandymon.com/index.php?/archives/285-Backing-Up-FreeNAS-to-an-external-hard-drive.html

His issues might be specific to the fact he plugged the HDD directly into the NAS though, vs running through a server like the Pi.

This now has me doubting to even use the Pi.  Its kind of nice to think of just putting this external HDD in the server rack plugged into my Freenas, set it up and forget it.  It will work a lot faster, not load my network and be out of site, out of mind and free up the Pi for any other project... oh read down to the end of the post, my current version of FreeNAS probably doesn't support UFS file system anyhow, maybe rsync is the most reliable setup given how frequent FreeNAS updates, and how huge the changes are when it does update.

TLDR, just use ext4.  Can always spin up a nix box to view the contents if the Pi dies or I need it to be faster.

Back to the youtube vid:

- sudo parted -a optimal -s /dev/sda2 mklabel gpt mkpart backup ext4 0% 100%

    - didn't work so used fdisk to format, device is now /dev/sda2pl
    - weird, now sda1 does not show up...

- sudo mkfs.ext4 -F /dev/sda2

- sudo parted /dev/sda2 print

- sudo mkdir /opt/backup

- sudo blkid (to get the UUID for fstab)

- sudo nano /etc/fstab     input UUID and other stanza info

- sudo mount /opt/backup
- ls opt/backup

THis whole thing is hosed- reminds me why I hate raspian so much.  Every single tutorial is different, every result is different from teh tutorials.  Is this why its a beginners device?  It forces one to RTFM and re-do opperations all day that would take only 5 minutes on a legit linux build?

Linux laptop + gparted = 2 minutes to un-f#ck the raspi's sh*t show of trying to format over the old format and inability to combine two partitions into one.  (I say innability because if you can't google how to do it, its as good as not able to, especially on something so "beginner friendly").

Note because of the fstab change made earlier, the SD card install of raspbian lite is now hosed.  But I wanted to re-do it anyhow (lost confidence of loss of sda1 partition of SD drive) and this time add the ssh folder (in hindsight maybe its simply due to reboot and on that reboot raspian expands the primary partition to take up the whole SD card).

--------- starting over ------------

Fresh raspian lite on the SD card via the Raspi burning software
- added "ssh.txt" to SD card after burn and re-insert into USB
- booted up, ssh worked, yay.
- raspi-config for GPU memory mod and timezone setup
- started following this tutorial as ext4 seems rare:
https://gordonlesti.com/mount-ext4-usb-flash-drive-to-raspberry-pi/
- sudo chown -R pi: /opt/backup

honorable mention, great site on some of the commands you can use within rsync:  https://www.tecmint.com/rsync-local-remote-file-synchronization-commands/

A part of me wants to make a different user on the pi, but this pi will stay behind my NAT and only use an SSH key to access freenas.



When you go to google this, most results are of making the FreeNAS the rsync server, not the client.  Or lots of results of rysinc from one FreeNAS to another.  Its a little harder to find info on an external non-FreeNAS system being the rsync server to pull from FreeNAS as teh client.  The FreeNAS documentation is bad for this goal, it just addresses FreeNAS as the server BUT a good read to understand the high level process of rsync is in 8.3.1

So off to watch a bunch of vids, tutorials and RTFM'ing as its confusing how to setup the FreeNAS to be just a client and the Pi the server.  If I understand correctly, the Pi's private SSH key will be on the FreeNAS, the FreeNAS will have the cron job to run rsync and the Pi is simply listening.

It looks as simple as going into FreeNAS > Tasks > Rsync Tasks > Add > select PUSH.

FreeNAS will only allow using a user with a private key set up in home dir- this will be an account on the Pi that has rights to the backup path specified above and has had a private/public keygen made, export the private to FreeNAS.  Teh part 3 video shows how to make that key only work for rysnc tasks as added security.

Back to the youtube tutorial part 3, rsync seems to already be installed on the current version of Raspian Lite.

The tutorial has the youtuber's linux laptop (client/PUSH) setup an ssh keygen, and moves the private key to the Pi...  The FreeNAS guide in 7.3.2 is more complicated, with private./public generated on the PUSH (FreeNAS) and both Pi and FreeNAS need to exchange public keys due to MITM mitigation.

Had to follow a guide to enable root login via SSH in the raspi

Then need to perform this setup from FreeNAS except no need to create a folder we already have on the pi after generating keys (was not necessary).

cat ~/.ssh/id_rsa.pub | ssh <USERNAME>@<IP-ADDRESS> 'mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys'



Setup script on Pi and cron job to run script to use rsync once a week (first do a manual rsync as first run might take longer than a week, do not want to start a new sync while original still running).

- Edit, so with backups running the 5TB drive will soon reach it's limit.  Enter mhddf, a really neat way of avoiding making an LVM partition between two drives, but having more of a jbod like setup.

* edit.  Removed "delay updates" from rsync syntax as it seems to lock my files (at least of a very large and long backup) into the ~tmp~ folder, in purgatory.  Hoping there is a script out there that can move the already backed up files out of this ~tmp~ folder and into their perspective folders recursively.

*  Need to create cron job to unmount the HDDs between rsyncs to spin down the drives- otherwise they are always spinning and humming away.

Wednesday, April 15, 2020

Idiots attempt of learning how to use Splunk's SimData

Splunk training has awesome 'fake' data to make it look like you are drilling down into a real network's pile of logs.

In the past I heard it was a program called EventGen that the Splunk team used, but an in-house version.

Now there has been a shift and the splunk dev website lists SimData.

https://dev.splunk.com/enterprise/docs/dataapps/simdata

One needs to install Java Run time Environment (JRE)

And the SimData Jar file:  They link to this:
https://dev.splunk.com/enterprise/downloads

Scroll down to the bottom to find the SimData Jar download.

The command to run SimData is java -jar simdata-<version>.jar -s <simulation_file> -c <scene_file>

https://dev.splunk.com/enterprise/docs/dataapps/simdata/runsimdatasimulation

so... we need a simulation file and a scene file.. also we need to set up HEC to get the data ingested into Splunk.

Simulation File (.simulation file extension) uses a Domain Specific Language (yeah right?  like intro CS brah)

https://dev.splunk.com/enterprise/docs/dataapps/simdata/simdatareference/simdatasimfileref/

And scene file (json format)

https://dev.splunk.com/enterprise/docs/dataapps/simdata/simdatareference/simdatascenefileref/

I...

Have ...

No...

Idea ...  Where to start.  But thankfully, there is an example file:

Example simulation and scene files here:
https://dev.splunk.com/enterprise/examples


DON'T FORGET to setup HEC:
https://docs.splunk.com/Documentation/Splunk/8.0.3/Data/UsetheHTTPEventCollector

From the Splunk SimData Examples README.md file:

# Splunk SimData Examples

This project is a collection of SimData example scenes and simulation files.
Each example has its own corresponding README file.

SimData is a tool that generates event data from a simulation of a user-defined scenario. Instead of using a sample set of data that is repetitive and unrealistic, SimData allows you to generate a rich and robust set of events from real-world situations by mimicking how multiple systems work together and affect the performance of your system.

## Get started

For details about installing, configuring, and running SimData, see the [Splunk Developer Portal](https://dev.splunk.com/enterprise/docs/dataapps/simdata/).

### Requirements

* Java 8+
* Download SimData the SimData JAR file: https://dev.splunk.com/enterprise/downloads

### Example usage

This example shows how to execute the SimData CLI:

```sh
java -jar <SimData JAR file> --simulation <path to simulation file> --scene <path to scene file>
```

## Contact
If you have questions, reach out to us on [Slack](https://splunkdevplatform.slack.com) in the **#simdata** channel or email us at _devinfo@splunk.com_.

At first you read, "this project is a collection of SimData example scenes and simulation files"  and you think, "awesome!  Maybe there are windows event log examples, RHEL, cisco!"

Nope.  There is one example set thus far.  "hello".

Hello's README.md

# Hello SimData example

To run this example, run:

```sh
java -jar <SimData JAR file> --simulation hello.simulation --scene hello.json
```

### Expected output

```sh
Starting simulation
"eventType"="Greeting" "text"="Hello, World!"
"eventType"="Greeting" "text"="Hello, World!"
"eventType"="Greeting" "text"="Hello, World!"
"eventType"="Greeting" "text"="Hello, World!"
"eventType"="Greeting" "text"="Hello, World!"

Thats no fun.  But it looks like on this page we can start to edit the files to make a webserver example:

https://dev.splunk.com/enterprise/docs/dataapps/simdata/examplesimulation

## quick note, don't follow the README's example of running simdata, follow this:
java -jar simdata-<version>.jar -s <simulation_file> -c <scene_file>
## another quick note, the output for options:

Usage: simdata [options]
  Options:
    --enable-debug, --debug
      Enables debug logging.
      Default: false
    -h, --help
      Show help information
    --no-rest, --no-web
      Disable the rest endpoints and web server.
      Default: false
    -p, --port
      The port to use for the REST endpoints and web server
      Default: 11013
  * -c, --scene
      The scene file
  * -s, --simulation
      The simulation file
    --start-time
      The absolute (ex: '2017-12-25 00:00:00') or relative (ex: '-1d', '-2h',
      '-3m') start time to backfill data from. Overrides the value set in the
      scene file.
    --validate
      Only validate the simulation and scene files, do not run the simulation.
      Default: false

## End note

The example link gets right to business, how to point this to our Splunk instance, input some stuff and get going.



``` break till next time ```




This is of some interest for a noob like me:

Use the simulation control UI

SimData provides a web-based user interface for you to update the value of variables at runtime at http://localhost:11013 or the port you specify using the SimData CLI. You can disable this web server by passing the --no-web flag. For more, see the SimData CLI reference.
This UI exposes controls for bots of entity types with runtime variable controls. Each entity type has a set of controls to modify the state of all bots of that type. Additionally, each bot has its own set of controls to modify the state of only that bot. The simulation control UI refreshes the state of bots every second.

Tuesday, April 14, 2020

Fixing the Splunk TA-pfsense stanza for sourcetype extraction

I'm not sure if its a setting due to me, be it the way my pfsense is setup or some other PEBCAK (Problem exists between chair and keyboard) issue but in the pursuit of maximum value extracted from pfsense logs I dare not tackle it myself, but taste test from people far smarter than me.

There are a few good sites to help explain what is found in pfsense filter logs:
Here is Netgates' guide:

Here is an attempt by a regex ninja at extracting fields:


I work with Splunk and like to use it whenever possible to keep my practice up.  There are a number of Splunk apps that tackle pfsense logs:

- homemonitor
- A3sec
- TA-pfsense
- Technology add-on for pfSense filterlog (looking at the download count and last date updated I passed over testing this add-on).

Looking 'under the hood', homemonitor is a one-size-fits-all app.  Its simply amazing- the installation GUI, the attempt to make it as comparable as possible across a broad range of brands of gear, the thought put into the various dashboards and panels.  If you are not running pfsense but some other consumer router, highly highly recommend running this app.

That said, it can only commit so much effort into extraction of the pfsense logs.  So onto another taste testing.

The TA and APP for pfSense by A3Sec is another full featured app.   It has lookup tables, tags, all kinds of higher level Splunking to make sense of and even add value to data.  But when I would look at the raw log streams (be it the pfsense webUI or clog -f a log in the  terminal) I was finding discrepancies.  I believe what has happened is as pfsense updates or changes log format slightly, it throws off the extractions.  Or again PEBCAK and I'm failing to see (but its likely) that A3Sec made a clever way of cranking down the noise and showing events that have more value/importance.  

But while looking into A3Sec app I found other posts online of people trying their hand at regex trying to extract every field possible, from event logs that can end up being very different in format due to the source of the log coming from IPv4 or IPv6 traffic, of OpenVPN traffic, webUI logs etc.

Add to the hair pulling, adding further data sources such as Snort or even Windows logs from one of my local machines would cause splunk to simply just stop ingesting.  What I found in this case is that a lot of these apps are not explicitly calling out a method of time extraction so Splunk uses it's default to attempt.  I theorize that when you have an assortment of apps and data feeds coming in where this is not specified, it breaks my free-license splunk instance's ability to field extract.  When investigating the internal logs you will see line break errors due to size, and some bogus dates being extracted, this was my sign.

So going over to this thread, someone provides a nice little regex to get pfsense time extracted:

With that set, you can have Snort logs, pfSense logs and other logs coming in.  For Snort I used a splunk stanza to simply put for time stamp the current time the data get ingested.  For line breaking (as many apps use time stamps as a method of knowing the end of one event and the beginning of a new log event) ensure the app that you use this time setting on has a stanza in the props that performs the line breaking.

Splunk configure timestamp recognition:

''' code test '''

So with different data inputs working at the same time, back to testing various pfSense extractions.  Onto the TA-pfsense add-on.

As a add-on, its simply (but not simple in-of-itself) the field extraction element and you are required to make dashboards.  I found porting over A3Sec's dashboards a great start.

Prepare yourself--- the extractions are amazing.  I was in awe what was coming in and getting caught.  Just wow.  A big part of this effectiveness is a great example of using a transforms.conf to look at the incoming data and with some regex output that data one sourcetype or another so that the subsequent props.conf stanzas can apply effective regex for the log format.

See example of this transforms using regex to get past the time data stamp, find pfsense and output the word right after that to give it a source type (sourcetyper).



And this is where I found issue.  In Splunk there are logs coming in without the time stamp in the raw, so it brakes TA-pfSense's ability to apply a sourcetype other than the default "pfsense", so something that should be sourcetype "pfsense:openvpn" and then accurately field extracted is instead coming out simply sourcetype "pfsense" and the openvpn props.conf field extractions are not applied.

When SSH'ing into the pfsense box and clog -f (linux's tail -f, pfsense is FreeBSD) the openvpn logs, there are dates and times.  Same in pfsense's WebUI log viewer.  

WHAT THE HECK?!

Example of raw and splunk, good sourcetype extraction



Example of raw without stamp, so sourcetyped only as "pfsense" but terminal has normal pfsense time format:


Issue with sourcetyping:

This was on a Splunk instance that also had the A3sec app installed, so just to be sure there wasn't some issue I was not finding with app interaction, little time extraction changes I had put in etc I built up a vanilla splunk server just for the TA-pfSense add-on.  There was some delay doing this as I was also moving over from ESXi free to XCP-ng in an attempt to get more hypervisor features.  Thats a whole nother blog post (and learning curve).

So now with a fresh vanilla CentOS7 box with Splunk 7.x installed, off we go.  Same issue with a vanilla setup (time extraction from that earlier splunk post not added, this is using all default).

Whats interesting is the lack of time in the raw logs in Splunk itself.  One thing to do is to spin up yet another server, A3Sec app and see if those same log types still come in the raw like that.  Another observation I had was the A3sec app's openvpn dashboard used to work, but seemed to stop.  I really suspect pfSense's syslog output to have possible changed.  Heck maybe a bug has been found?  Its not streaming out as its displayed in the terminal.

If I can't get pfSense to output correctly, or if this is an "at ingestion" issue, a band-aid might be to completely migrate away from the current transforms.conf method and use some other regex that just finds the first reference of pfsense, takes the word after the next dead space and use that for source typing.  I'm very weak on regex, but this is a reason to try.

UPDATE 7/18/2020
With fresh eyes I looked at apps/TA-pfsense/default/props.conf

I saw right at the top:
 SEDCMD-event_cleaner = s/^(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+\S+\.\S+\s+/\1/g
 SEDCMD-event_cleaner2 = s/^(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+(\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s)+/\1/g
 SEDCMD-event_cleaner3 = s/^\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s\S+\s(\S+\s)/\1/g

So has this dropping of the time stamps of certain source types (and some of the message itself) been a function?  Intentional to reduce sourcetype noise?  Not sure as the props has extractions for openvpn but those were not working because openvpn logs were not getting sourcetyped.

I commented them out and bam, the types of sourcetypes filling the splunk index started to increase in diversity.  Not sure if there is a negative long term effect with this, I wish the dev was reachable to ask.  

Thursday, April 9, 2020

LetsEncryp and NGINX for hass.io with docker

The goal was to use linuxserver.io's letsencrypt container that has both NGINX and Let's Encrypt combined together, and use it to provide a reverse proxy service to a homeassistant container on a CentOS machine.

TLDR- I didn't get this to work so I used Lawrence Systems YouTube vid on pfsense HAproxy and acme certs to accomplish the same goal, all within a GUI.
https://www.youtube.com/watch?v=gVOEdt-BHDY&t=1320s

Its more steps and I didn't get it right on the first go, but its easy to trouble shoot via GUI and re-reviewing Lawrence's video.

Onto the failed attempt(s) using LinuxServer.io:

Homeassistant has an NGINX pluging and Letsencrypt plugin but the documentation is very lacking (really, homeassistant documentation that is outdated and/or not written for noobs?  No way!)

So after reading a homeasistant user's tutorial thread:
https://community.home-assistant.io/t/nginx-reverse-proxy-set-up-guide-docker/54802

And then later in troubleshooting per the linuxserver.io team discord and the linuxserver.io instructions
https://blog.linuxserver.io/2019/04/25/letsencrypt-nginx-starter-guide/#authorizationmethod

- looks easy as cake right?

Firstly, I transfered my domain from Bluehost to Cloudflare.  This was overdue, but its mainly due to Cloudflare's API(s) being supported by a great many projects, this and acme (pfsense) being some of them.

I created a cname record for the subdomain I wanted to use as well.  I'm fairly decent with firewall rules, port forwarding and NAT settings on pfsense as well, thanks to Lawrence Systems on youtube and having already done a number of self hosting projects such as a wordpress server built on a LAMP stack facing the internet.

What followed was what feels like countless docker compose builds / nukes playing with config files, going 100% per linuxserver,io instructions, going 100% per homeassistant forum instruction, and mixtures of the two.

The linuxserver.io team was sure their default setup using the renamed homeassistant proxy conf would work (contrary to the homeassistant forum... and most other product specific forums running their container for things like Plex, Nextcloud on things like UnRaid etc).

But this was a great exercise in learning Docker, and a reminder there are still plenty of linux communities where if you are not at a dev's / grey beard's level, you are not welcome- but that is human nature and not going away anytime soon.

Bit thanks to Lawrence System's channel and discord not being one of these places, but fostering learning and community acceptance/training (go figure they are associated with the "Learn Linux" channel which is also amazing).  Together with places like Level1techs, we noobs aspiring to RTFM but need help along the way have great people and resources.

Useful docker links:

Docker Compose install, basic use:

Correcting my machines timezone
https://linuxize.com/post/how-to-set-or-change-timezone-on-ubuntu-18-04/

Level1techs:
https://level1techs.com/
click on their forums, and check out their merch at their store