The Problem with AI and ML today

I have to admit, that I’m not an expert in AI or machine learning (ML) but I think that I understand it on a certain high level good enough. In the end I did some work in BigData, Hadoop and have been reading on AI and ML quite a bit already. And since the start I had this uncertain feeling, that the current state of AI even with deep learning is not really intelligent. Yes it seems to work to a certain level, you see this with the current progress with automated driving or also with use cases in IIoT like visual inspection or material checks that are based on AI models and deep learning. 

But what always struck me, is that the system that does all great functionality is really dump and it has no idea what it has learned. Nobody can look at the “mental” model of the AI model and explain why it can detect an object or recognise a pattern. It just works based on pure data. That is exactly this, AI today works on detecting something interesting just based on the input data it has been trained with. 

So a couple of weeks ago I bought a book as I stumbled over it on Amazon. This week I started reading “The book of Why” from Judea Pearl and Dana Mackenzie. The book is about the theory of causal relations and the need for causation in artificial intelligence.

Already the first chapter struck me like a lightening bolt. Judea explained exactly what I always felt, that the current AI is level 1 of the ladder of causation. Level 1 means that learning is based on associations that are found in the data by the algorithm.  The mechanism for this is in the end statistics, probability, that’s all.

The Book of Why: The new science of cause and effect | @TAragonMD

Associations are detected in the data because the AI model has been trained with some similar pattern and when it sees it again it can detect it. But the pattern needs to be at least similar to something learned, that is why it is so important to have good and tons of training data. If there is a completely new pattern in the data that the algorithm hasn’t learned yet, it cannot detect it. That is why the intelligence of such an algorithm is on the level of an animal but any small child with 3 years is more intelligent. 

And worst, the model doesn’t really know what it has learned, the representation is just factors in e.g. a neural network. There is really no knowledge representation as such.

Now I do have since quite some time one topic that is always in my focus and that is semantic web technology and the way how knowledge can be represented in a knowledge graph and how to work with that in real-world technology. Before in the space of IT management, now in the area of IIoT.

Now and here is the point that struck me like a hit with a lamp post. On the one hand there is the classical AI technology with the ability to automatically learn and detect patterns. On the other hand there is semantic technology with its semantic data models and query mechanisms on a formal machine readable knowledge representations. 

And the difference to the next level 2 in Judea’s ladder of causation is exactly that one has a causal model not just data. The causal model is represented as a directed graph of causal relations with numerical factors on the edges.

Book review: The Book of Why: The New Science of Cause and Effect (Judea  Pearl, Dana Mackenzie) – Clear Language, Clear Mind

Now that sounds very familiar to me, that is easy to represent as a semantic graph in RDF or OWL! Causal relations represented as relations in a semantic model as one of the most important relations. 

Technically there are of course a couple of questions practically how an AI ML model can work with a semantic graph model. Probably one needs to transform the knowledge graph into a ANN first. It would be interesting to speak with an AI expert on this.

I would even go so far as to it would be a benefit to represent learned associations in such a model as well. Knowledge is in the end different types, there is fact knowledge, rules, causal relations, associations and other relations, that are not causal. If we represent all these in a semantic model, we come closer to how we see the human brain. Because as human beings we do record these relations as well and we are aware of them, we can search and access them, just like a knowledge graph!

Maybe this is in the end the way how we can bring computers to at least level two of the ladder of causation and doing this also for our applications in IIoT.

 

IoT Hackers Handbook

Somewhere, I don’t know where, I was getting aware of the book “IoT Hackers Handbook” from Aditya Gupta. Well, bought it, read it. That wasn’t quite a long job as the font size is a bit larger than normal. There are two reasons you do this, either you want to avoid that older reader need their glasses (me?) or there’s not too much content but you still want to make it look like a in-depth book on the topic.

It was indeed a bit different than expected. Not bad, but different, which also tells you something. I’m a software guy, looking into hardware-near topics like BLE sniffing is interesting but not my homeland, so to say. But this book really started with hardware hacking after some introductory chapter on penetration testing IoT devices. I mean UART communication, JTAG debugging. Then it went slowly up in direction software, via firmware hacking, mobile apps (Android), software define radio (SDR) to Zigbee and BLE sniffing and packet resend. It didn’t get higher than this. That’s ok, as there had been topic, I hadn’t touched so far except for BLE sniffing. Especially the SDR part was quite interesting and encouraged my to maybe dig a bit into this topic. Understanding the communication of garage door openers etc. sounds interesting over all.

Don’t get me wrong, for consumer IoT devices, this is all important stuff to understand, test and hack. But IoT is a bit more than hardware, firmware and communication, at least in my mind. IoT lives from software, and not just hardware-near software. That is what brings the value and the new business models for IoT. Sure the book touched mobile apps as important part of a IoT solution but there is all the cloud connectivity and the software stack on the IoT device that I find the interesting part. And that was not covered beyond ZigBee and BLE. So not bad and helpful but surprising regarding the direction of what IoT pentesting should be like and maybe telling something about how IoT is regarded still today. 

To be fair, the book did dig into some use-cases of what you could do when having access to the device and being able to manipulate it at will, which wasn’t really difficult with the examples provided by the author. Weather stations, door openers, garage openers, the usual smart light bulb and beacons. I learned still a lot about tools and techniques for these low-end IoT devices and how easy it is to break them with just a little bit of knowing some tools and reading specifications. And unfortunately you can transfer this experience to more complex “IoT” devices like PLCs in IIoT or gateways. Just the specifications are a bit thicker and complex. But the door is equally wide open for white as well as black hats. 

Dualcomm 10/100/1000Base-T Gigabit Ethernet Network TAP

As I only have only a unmanaged switch Netgear FS116 at home, I don’t have a SPAN port to do network sniffing on the home LAN. In the course of building up a NSM (network security monitoring) setup for my home network, I needed a way to tap the wired LAN. Therefore I looked at network taps, which tend to be extremely expensive for home use. Finally I found some recommendation and bought a Dualcomm 10/100/1000Base-T Gigabit Ethernet Network TAP. It’s not cheap but better quality than a throwing star tap and offers full duplex passive sniffing of network traffic for affordable price. 

Setup is absolutely seamless, as there is no setup, just put the tap between the home router and the switch in order to get all internal traffic coming from outside, LAN cable to the sniffing ethernet interface and that’s it. The little box is powered by USB, so just put the USB cable to a monitor’s USB and that worked fine enough. 

Currently there are two options, either I use my RPI 3B LAN interface as sniffing interface and the RPI’s WLAN as management interface. Or I can also attach it to a old laptop, that I use as a monitoring collection station with SELKS distribution on it. I use SELKS instead of Security Onion (SO), because the laptop is just too old and SO freezes on this hardware. SELKS also has ELK stack and suricata installed and runs decently. Not optimal performance, but for testing it works. Also here WLAN is the management interface as the laptop also only has one wired LAN interface. And sniffing interfaces are not managed and don’t get an IP, as they are input only.

Long-term it could be interesting to replace the unmanaged switch with a managed switch so that one can move the tap to any other place  and use the SPAN port of the managed switch for e.g. the RPI. With the new RPI 4 model B one gets true gigabit LAN and that should be able to handle all traffic that the switch provides without any problems in such a home setup.

The packet-foo blog contains the probably best article series on network package capture and analysis including network taps, that you can find. 

Trying to build packetbeat for Raspberry PI (arm64)

After my previous article on building filebeat for Raspberry PI 3 B+ (arm64), I now wanted to get a binary for packetbeat, the second most interesting module of elastic beats. I tried the same approach with cross-compiling using GOARCH=arm64 but it fails, while a straight compile for amd64 works. It fails with a message that it excludes all Go files due to build constraints. Issue is that there is probably native C code involved and you cannot cross-compile this beat. I searched posts and tried all options for 2 hours, it does not work.

I tried again on the PI directly, the build is running but if you do a “go install”, it finally gets out of memory (“cannot allocate memory”). Problem is that the PI 3 has only 1 GB of memory and that does not seem to be enough. I tried all kinds of tricks, like setting GOMAXPROC=1, GOGC=70 but njet. The problem also seems to be related to the C build using gcc. You need to install “libpcap-dev” for the “pcap.h” header file using “apt-get install”, otherwise one gets a compile error earlier.  If using “go build -v -x” directly you get this “cannot allocate memory” message from gcc. When using “make” the build gets killed instead. Nevertheless it’s rare are there are reports from people that compile Kubernetes on arm64 RPI 3B like mine. Probably K8s does not contain native C parts like the libpcap in packetbeat. So I finally gave up, because …

But … there is good news ahead! Since a few days, the new Raspberry PI 4 Model B has been released! With up to 4 GB memory that will hopefully work. Also it has now true GB LAN, which is for network sniffing, not a bad idea either, when attaching it to a real network tap. So that is a clear buying plan for Juli!

Produkte digital-first denken

Barbara Hoisl, ist eine freiberufliche Business- und Strategieberaterin und eine lang-jährige Freundin aus alten Zeiten, als ich bei Hewlett-Packard (HP Openview Software, ein Bereich, den es in dieser Form nicht mehr gibt) gearbeitet hatte.  Barbara ist eine, nun ja visionäre, Expertin für Software-Produktmanagement, Finanzierung von Startups und den Software-Business. Ich hatte die positive Erfahrung Barbara früher bei HP eine kurze Zeit als Chefin zu haben. 

Letztes Jahr hat Barbara doch tatsächlich ein eigenes Buch geschrieben, “Produkte digital-first denken“, auf Deutsch. Ich schätze mich glücklich zu denjenigen zu gehören, die Anfang des Jahres eine (kostenlose) Ausgabe ihres neuen Buches bekommen hat. Daher wollte ich hier darüber berichten, wie das Buch geworden ist und was ich daraus gelernt habe.

Erst ist man irritiert, muss man ein deutsches Buch über das Thema Digitalisierung schreiben? Aber ich habe auch in der Arbeit schon öfter festgestellt, man vergisst schnell, dass ich Jahre-lang bei einer amerikanischen Firma gearbeitet habe und die Verwendung von English als Umgangssprache für mich zur Selbstverständlichkeit geworden ist, aber für doch noch viele, die nicht aus der Softwarebranche kommen eher noch ein Problem darstellt. Und ihr Buch wendet sich ganz klar an deutsche mittelständige Unternehmen, wo Deutsch doch noch die Fachsprache darstellt. Bis vor wenigen Jahren war das bei meinem Arbeitgeber (Bosch) auch noch der Fall.

Das ist auch schon einer der interessanten Punkte, warum dieses Buch eine Lücke im Portfolio der Bücher über Digitalisierung darstellt, es ist wirklich für den Personenkreis geschrieben, der die Digitalisierung und die Einführung von Softwareprodukten, IoT und IIoT durchführen muss um fit für die Zukunft zu werden. Und den Zielgruppen-gerechten Schreibstiel hat Barbara auf faszinierende Weise getroffen. Da sind auf der einen Seite doch die vielen anglophilen Ausdrücke, die für uns Softwerker so selbstverständlich sind, für das Zielpublikum aber hole Phrasen darstellen. Aber hinter den “Phrasen” stecken eben wesentliche Konzepte der Softwarewelt, welche die heute großen IT-Player (GAFA = Google Apple Facebook Amazon) eben erfolgreich gemacht haben und die ohne eine Anpassung der etablierten Produktionsfirmen in Deutschland in Zukunft auch deren Geschäft gefährden werden. Das heisst, wenn sie eben nicht die Digitalisierung und die Einführung von Softwareprodukten ernst nehmen.

Und genau das erklärt Barbara in verständlichen Worten, erklärt die Sätze wie “Software is eating the world”, “Winner takes it all” Effekt in Platform-Geschäftsmodellen, “Think big, smart small” und “Sell the future” Strategie. Interessant ist dabei, dass ich, der sich auch schon intensiv mit Software-Platform Geschäftsmodellen auseinandergesetzt habe und der all diese Prinzipien der Softwarewelt als gegeben und als klar versteht, dabei immer noch etwas lernen kann. Man wird sich über die Unterschiede zwischen den deutschen erfolgreichen Produktionsunternehmen und den (meist amerikanischen) IT-Unternehmen noch klarer und erkennt den Handlungsbedarf Produkte digital neu zu erschaffen.

Bosch ist eine solches Unternehmen, mit hunderten Produktionswerken und unglaublichem Wissen über Fertigung und Logistik und ein Unternehmen, dass sich ganz klar auf den Weg zum Software-Unternehmen befindet. Mein Geschäftsbereich “Bosch Connected Industry” ist an vorderster Front mit dabei. Aber ich habe auch schon, auf Messen oder in Gesprächen, bemerkt, dass dies durchaus nicht für die Masse der kleineren mittelständischen Unternehmen, insb. in Baden-Württemberg gilt. Dabei gibt es hier viele heutige Weltmarktführer in hunderten technischen Nischenmärkten. Und genau diese muss das Wissen über die wirkliche, Buzzword-erklärte Bedeutung erreichen. Barbara’s Buch ist einzigartig darin, genau das hoffentlich erreichen zu können.

Was mich dabei fasziniert hat ist, durch die Darstellung im Buch wieder klar zu werden, wie wichtig dabei die richtige Denkweise (“Digital Mindset”) zu bekommen (zu erlernen?). Zu verstehen wie die neuen großen innovativen IT-Player denken gegenüber den traditionellen etablierten aber langsamen Unternehmen. Barbara erklärt dabei viele Modelle, wie den Produkt-Lifecycle, Moore’s Law und exponentielles Wachstum, 3 Horizonte der Innovation, Innovator’s Dilemma, 10 Types of Innovation, 6D-Modell. Die beiden letzen waren z.B. auch für mich neu und ich habe mir gleich die dazugehörende Literatur besorgt.

Das schöne an ihrem Buch ist, dass sie die abstrakten Modell immer mit praktischen Beispiele aus B2B und B2C Märkten erklärt. Bosch Software Innovations (mein erster GB bei Bosch) kommt übrigens auch darin vor (sic!). Lieblingsbeispiel Tesla, wo es für mich auch noch etwas zu lernen gab. 

Schliesslich gibt sie auch noch einige Empfehlungen am Ende des Buchs wie man die Transition zu einem Unternehmen, das “digital-first” denkt organisieren sollte. Nicht, dass jedes Unternehmen das so angehen würde und man sieht die Probleme, die dadurch in der Umsetzung entstehen im eigenen Unternehmensbereich. Alles in allem eine bereicherndes Buch, dass ich jedem der im IIoT Bereich unterwegs sind oder sein sollten, und das sind eben alle traditionellen Produktionsunternehmen, wärmstens and Herz legen kann. 

Viele interessante Erkenntnisse beim Lesen!

Peter

Installing filebeat on Raspberry PI 3 (amd64)

Currently I’m experimenting with using a Raspberry PI 3 B+ as a network security monitoring (NSM) sensor node. So I have Bro and Suricata installed on that little guy running Kali Linux for arm64. But I need a modern way to transport the logs to its log monitoring station. So not using syslog-ng or ryslog but the best log shipper for the elastic stack, and that is Beats, better the Filebeats.

Problem: Elastic does, unfortunately, despite desperate inquiries from users in the forums not provide binaries or a .deb package for Beats. After trying some other paths I came across some receipts to install Beats on arm64 by manually compiling the binary with Go. I have to say, Go is marvelous! On the PI itself, I had bad luck, because the “go build” quickly finished with out-of-memory. So that didn’t work unfortunately on that little pal.

But because Go is so cool, I just “cross-complied” it on a bigger laptop, also running Kali Linux. And that’s so easy that I have to tell the world, because the other receipts are sometimes too specific and parts are missing for a full running manual installation, which is more than just the filebeats binary.

Step one on the other Debian-base system, the laptop, you need of course also Go installed.

# mkdir -p go/src/github.com/elastic
# cd go/src/github.com/elastic
# git clone https://github.com/elastic/beats.git
# export GOPATH=$PWD/go

You could also get the sources with “go get” bu that doesn’t matter, result the same.
Now the important step, watch out:

# export GOARCH=arm64
# cd beats
# go build -v -x

Flags just so see what’s happening, as go build is very silent otherwise. Magic, in a few seconds, you have a “filebeat” binary in this directory!
Try:

# file filebeat
filebeat: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=svVi8LJGhqXEjRJveTrA/7cOYouMPn1VzyeJqwq3W/TXZ3DZ8Wa_QYdKnsR8cm/8bg35yoawYw18mAJ30oX, not stripped

Remember we are on amd64 not arm64 on the laptop!
Now just copy the file over to the PI using ssh and test it there:

# ./filebeat –help

Works! But when you try

# ./filebeat modules list

It does not show any, because we are missing something, all the module configuration and kibana dashboards that are normally also contained in the .deb package.  So on the laptop just install filebeats, as for amd64, there is of course package:

# cd ~/Downloads
# curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.1.1-amd64.deb
# dpkg -i filebeat-7.1.1-amd64.deb
# filebeat modules list

Here you get the modules of course.
Now just let’s see what’s in the debian package:

# dpkg –listfiles filebeat|more

As you can see, besides the binary (for amd64) no other binaries are really in the .deb, just lots of YAML and JSON files. Now that’s of course good news.
So what I did for getting a fully functional installation is just copy the files over form the laptop to the PI using SSH in /etc/init.d/filebeat, /etc/filebeat/*, /usr/share/filebeat, /lib/systemd/system and /usr/bin/filebeat (a script). Then place the compiled arm64 binary in “/usr/share/filebeat/bin/filebeat” and we’re got to go on the PI:

# filebeat modules list

And here we get the list.
Now this is not a package that will be manged by apt-get of course. Maybe, I didn’t try one could for to install the official amd64 .deb package and only exchange compiled binary.

Hope this helps, Peter