Learn everything about every internet connected device with Python

Learn everything about every internet connected device with Python

Warning: do not run these scripts unless you understand the repercussions

There are 2 sections here, defensive and offensive.

Why did I do this?

Taking my own network security seriously I started monitoring all incoming and unknown outgoing packets using a passive tap which is a network packet capturing device (SharkTap) that intercepts and duplicates traffic behind my firewall and in front of my vlan where devices and WiFi are placed.


To know the anomalous you first need to understand the normal, so I first turned off all my devices and turned them on one by one while collecting mac addresses, setting static ip addresses, and understanding which port and protocol they communicate over.
With this spreadsheet of information I created a network topology diagram and several UML diagrams for my connected devices or applications in operation.

Collection of data

We're going to first need some data for our exercise.


For best speed using a language i know well, we'll use Bash with tshark (command line utility of wireshark);

#!/usr/bin/env bash
DATE=`date "+%Y%m%d-%H%M"`
OUTPUT_DIR=<change me>

if [[ -z ${DURATION} ]]; then
if [[ -z ${INTERFACE} ]]; then
echo -e "Capturing interface [${INTERFACE}] for [${DURATION}] seconds"
echo -e "Output: ${OUTPUT_DIR}${FILENAME}${EXT}"
echo -e "Log: ${LOG_DIR}${LOGNAME}"
mkdir -p ${OUTPUT_DIR}
tshark -a duration:${DURATION} \
    -i ${INTERFACE} \
    -u s \
    -E separator=, \
    -E quote=d \
    -E occurrence=f \
    -T fields \
		-e frame.time_epoch \
		-e ip.src \
		-e ip.dst \
		-e http.host \
		-e http.request.uri \
		-e dns.qry.name \
		-e tcp.srcport \
		-e tcp.dstport \
		-e udp.srcport \
		-e udp.dstport \
		-e _ws.col.Protocol \
		-e _ws.col.Info \
		-e arp.src.hw_mac \
		-e arp.dst.hw_mac \
		-e http.user_agent \
		-e eth.src \
		-e eth.dst \
		-e eth.src_resolved \
		-e eth.dst_resolved \

env GZIP=-9 tar cvzf ${WORKDIR}${FILENAME}${C_EXT} ${WORKDIR}${FILENAME}${EXT} && \
    rm ${WORKDIR}${FILENAME}${EXT} && \

Complete the self-explanatory variables with relevant values and call the script with ./capture.sh 60 all to capture on all interfaces for 1 minute.

The resulting file will look like this

"1503238803.452442307","","",,,,"59435","20008",,,"TCP","59435 → 20008 [ACK] Seq=1 Ack=1 Win=4094 Len=0 TSval=1005594454 TSecr=2175465899",,,,"28:xx:xx:xx:xx:bf","f4:xx:xx:xx:xx:54","Apple_17:02:bf","Elitegro_6a:7f:54"
"1503238803.455425666","","",,,,"17548","51413",,,"TCP","17548 → 51413 [ACK] Seq=1 Ack=1 Win=16685 Len=0 SLE=1421 SRE=5681",,,,"e0:xx:xx:xx:xx:14","90:xx:xx:xx:xx:b9","Technico_d3:6a:14","AsustekC_c8:f5:b9"

First is some internal traffic, and the next is incoming from the Internet.

Backup to S3

Completely optional, but I like to back up my files to S3, here is my hourly script to do that;

#!/usr/bin/env bash
AWS=`which aws`
NOW=`date "+%Y%m%d-%H"`
BUCKET=<change me>
AWS_PROFILE=<change me>



echo -e "Creating ${TMPDIR}"
mkdir -p ${TMPDIR} && \
    echo -e "ok"

echo -e "Copy from NAS ${BACKUPDIR}"
cp ${BACKUPDIR}`date -d "1 hour ago" "+%Y%m%d-%H"`*.tar.gz ${TMPDIR} && \
    echo -e "ok"
echo -e "AWS S3 upload starting"
$AWS s3 cp \
    ${TMPDIR} \
    s3://${BUCKET} \
    --profile=${AWS_PROFILE} \
    --exclude "*" \
    --include "*.tar.gz" \
    --recursive \
        >> ${LOG_DIR}/${LOGNAME} && \
    echo -e "ok"
echo -e "Cleanup ${TMPDIR}"
rm -rf ${TMPDIR} && \
    echo -e "done"

Again you just need to change the variables.

Using the aws cli tool you can also easily sync a full directory to an S3 bucket folder if your local source has capacity to keep a copy of all files that are backed up, just replace cp with sync.

Analyse with Python Pandas

I like to use Pandas mostly for my analysis, it's not the fastest or most efficient with large datasets but it serves me well until i encounter such limitations then i pivot to the right tool for what I am trying to do. So to get you started, here is my script to download csv files from S3 and compile a DataFrame in Pandas.

import sys
import threading
import pandas as pd
import boto3
import ntpath
from datetime import datetime
from pprint import pprint

BUCKET = '< change me >'
PROFILE = '<.change me >'

class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._seen_so_far = 0
        self._lock = threading.Lock()
    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
                "\r%s --> %s bytes transferred" % (
                    self._filename, self._seen_so_far))

def download(object):
  fp = object.get('Key')
  filename = ntpath.basename(fp)
  obj = client.download_file(BUCKET, fp, filename, Callback=ProgressPercentage(filename))
  return obj, filename

dev = boto3.session.Session(profile_name=PROFILE)
client = dev.client('s3')

chunksize = 10 ** 8
objects = client.list_objects(Bucket=BUCKET)
df = pd.DataFrame()
for object in objects.get('Contents'):
    obj, filename = download(object)
    for df in pd.read_csv(filename, na_values=['nan'], keep_default_na=False, compression='gzip', error_bad_lines=False, chunksize=chunksize):
        df.append(df, ignore_index=True)

Use df data frame, maybe save for Hadoop with to_hdf()

Typically I just interrogate pcap data using tshark alone, but the above becomes more useful once you've gone beyond the data in pcap which we'll be gathering next.

Enrich your data

When it comes to IP the first thing that comes to mind is WHOIS. You won't always get useful data here because you can pay extra to protect it, but criminals are mostly looking to get paid not pay to protect themselves so I find this to be particularly useful for investigation;

def whois(ip):
  countries = get_countries()
  obj = IPWhois(ip)
  result = obj.lookup_rdap(depth=1, asn_methods=['dns', 'whois', 'http'])
  country = countries[result['asn_country_code']]
  type = result['network']['type']
  name = result['network']['name']
  description = result['asn_description']
  registry = result['asn_registry']
  entities = ', '.join(result['entities'])

  return country, type, name, description, registry, entities

There are also some free services to use the IP to get some geo-location data;

def get_coords(ip):
  url = "https://freegeoip.net/json/%s" % ip
  r = requests.get(url)

  return r.json()

This will produce something like this

{"ip":"xxx.xxx.xxx.xxx","country_code":"AU","country_name":"Australia","region_code":"VIC","region_name":"Victoria","city":"Brunswick East","zip_code":"3057","time_zone":"Australia/Melbourne","latitude":-37.7725,"longitude":144.9724,"metro_code":0}

We can now use Google Maps to find out even more useful information using the latitude and longitude (you'll need your own API key to perform this query)

def get_geo(lat, lon):
  url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?key={}&radius=1&location={},{}".format(API_KEY, lat, lon)
  r = requests.get(url)

  return r.json()

Google responds with a bunch of info, but you can get even more by taking the Google Place ID and performing another query;

def get_place(id):
  url = "https://maps.googleapis.com/maps/api/place/details/json?key={}&placeid={}".format(API_KEY, id)
  r = requests.get(url)

  return r.json()

Which gives you a rich dataset for the location.


On the offensive side, you may discover some nefarious activity and the evidence shows you it's from somewhere law enforcement can't help you so you decide to take action into your own hands.

I recommend you report to authorities rather than go on offense

First clue to find is if there is a website at this IP address, you can easily identify a hostname from IP using python -c "import socket;print socket.gethostbyaddr('xxx.xxx.xxx.xxx')".

To test if a webserver responds on an alternative port you'll first will need to identify all exposed ports;

#!/usr/bin/env python
# pip install python-nmap
# usage:
#    ./scanner.py -i 59.x.x.x
import nmap
import argparse
import termios, fcntl, sys, os
from datetime import datetime

parser = argparse.ArgumentParser()
parser.add_argument('-i --ip', help='host ip to scan', dest='ip')
args = parser.parse_args()
ip = args.ip
if not ip:
  print 'host ip must be provided'

def get_items(dict_object):
  for key in dict_object:
    yield key, dict_object[key]

def scan_cb(host, result):
  state = result['scan'][host]['status']['state']
  hostname = result['scan'][host]['hostnames'][0]['name']
  print "[%s] Host: %s (%s)" % (state, hostname, host)

  for port, p in get_items(result['scan'][host]['tcp']):
    port_state = p['state']
    port_name = p['product']
    port_info = "%s %s" % (p['reason'], p['extrainfo'])
    print('[%s] Port %d: %s (%s)' % (port_state, port, port_name, port_info))

fd = sys.stdin.fileno()

oldterm = termios.tcgetattr(fd)
newattr = termios.tcgetattr(fd)
newattr[3] = newattr[3] & ~termios.ICANON & ~termios.ECHO
termios.tcsetattr(fd, termios.TCSANOW, newattr)

oldflags = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)

  nm = nmap.PortScannerAsync()
  nm.scan(ip, '20-23,25,53,57,67-69,80,81,82,107-113,115,118-119,135,137-139,143,153,156,170,177,179,194,209,213,218,220,300,311,366,369-371,383,384,387,389,399,427,433,434,443-445,464,465,475,491,514,515,517,518,520,521,524,530-533,540,546-548,556,560,561,563-585,587,591,593,601,604,623,625,631,635,636,639,641,646-648,653-655,657,660,666,674,688,690,691,706,711,712,749-754,760,782,783,808,832,843,847,848,873,953-61000', callback=scan_cb)
  start_time = datetime.now()
  while nm.still_scanning():
    stop_time = datetime.now()
    elapsed = stop_time - start_time
      c = sys.stdin.read(1)
      if str(c) == 's':
        print("elapsed %ds" % elapsed.seconds)
    except IOError: pass

    termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm)
    fcntl.fcntl(fd, fcntl.F_SETFL, oldflags)

This will take a while, press s on the keyboard to see the elapsed time (and know it's still running the scan).

Once you have some ports to test you can simply use a HTTP head request in python to see if a web server responds python -c "from requests import request;print request('head', 'http://xxx.xxx.xxx.xxx:8080').headers".

Knowing that there is a web server you can use some tools like sqlmap to attempt SQL Injection, or ZAP to identify XSS or other known un-patched web vulnerabilities.

If you are really keen or the attacker was an internet connected device rather than a server you could go 1 step further, with that list of open ports you can cross reference or Google search the port number to try and identify what program is listening on that port.
Knowing the application is just the first step, you'll have to research to find out how to carry out an exploit on anything you find.

For the researchers and really bad guys

It's kind of terrifying how easy all of the above was to figure out, I was always under the impression this stuff was difficult. It really worried me when I realised just how easy it would be to turn these defensive and retaliatory techniques and use them purely offensively with just 16 lines of code.

#!/usr/bin/env python
# pip install netaddr
from netaddr import IPAddress, IPNetwork

file_out = "./internet-connected.csv"
with open(file_out, mode='a+') as f:
    for class_a in range(0, 255):
        a = IPAddress("%s.0.0.0" % class_a)
        if not a.is_reserved():
            for class_b in range(0, 255):
                b = IPAddress("%d.%d.0.0" % (class_a, class_b))
                if not b.is_reserved():
                    for class_c in range(0, 255):
                        c = IPAddress("%d.%d.%d.0" % (class_a, class_b, class_c))
                        if not c.is_reserved() and not c.is_private():
                            ip = "%s/24" % c
                            for host_ip in IPNetwork(ip).iter_hosts():
                                if host_ip.is_unicast():
                                    f.write(str(host_ip) + '\n')

The above snippet will compile you a list of IPv4 addresses that are potentially internet connected devices, all of them..

With such a list you avoid raising too much suspicion by hitting reserved addresses and with constant scanning a researcher might be able to identify hosts that are new using a diff on past scans, learn which hosts are tor exit nodes, compile analysis on usage of certain application usage like web servers or operating systems - the possibilities are pretty endless.

Thank you for reading and don't forget to share this if you found it interesting.