Machine Reasoning: The forgotten side of AI

Even "Machine Reasoning" is a made up term.

Dec 04, 2023

Artificial Intelligence (AI) is seemingly everywhere these days, with massive amounts of money being thrown around at projects that are little more than API calls to ChatGPT. Sigh… I really should scam some investor during one of these silly gold rushes, I’m poor only because I want to be it seems. AI on the blockchain anyone?

Anyway, that’s not to undermine the incredible achievement that is ChatGPT or other Large Language Models (LLMs), nor other neural network-based solutions for image or audio generation and such. They’re game changing pieces of tech, no doubt about it.

But I cannot help but be sad that the focus is all in on neural networks, they’re not the only approach to AI. Not only are there many other forms of Machine Learning (ML)1, but there is also Machine Reasoning (MR), the focus of this post.

Machine Reasoning? What’s that?

If you’ve never heard the term Machine Reasoning, I don’t blame you, I’m convinced it’s something a former manager of mine came up with to distinguish what we were doing from Machine Learning (which was already getting hyped at the time).

You won’t find the term on Wikipedia, instead you’ll find the idea split across many different pages like “automated reasoning”, “knowledge representation and reasoning”, “constraint satisfaction problem”, and more. A complete mess to follow.

To make it clear what I mean: a Machine Learning system is a piece of software that derives rules from facts (an inductive process). In contrast, a Machine Reasoning system is a piece of software that derives facts from rules (a deductive process).

Wave function collapse is a Machine Reasoning-based approach to texture synthesis. More concretely it is based in part on the AC-3 algorithm from Constraint Programming (CP). We’ll discuss CP below. Note: The image above is intentionally slowed down for illustrative purposes.

Expert Systems you mean?

When I mention Machine Reasoning to people, the first thing that usually comes to mind for them are Expert Systems — piles of “IF-THEN” rules painstakingly hand-written by domain experts. It’s no surprise those things fell out of favor against machine learning approaches that just figure things out from a big pile of data.

Expert systems are a form of machine reasoning yes but one of the least interesting IMO. Just as linear regression is not the first thing that comes to mind when talking about Machine Learning, neither should expert systems be the first thing that comes to mind when talking about Machine Reasoning.

If a path-finding algorithm like A* is what first came to your mind, great, that’s more like it!

For me the truly interesting Machine Reasoning systems are model generators for which the constraint satisfaction problem page is the closest but incomplete.

What’s a model generator you ask? It’s a piece of software that given a set of variables and a set of constraints on those variables, produces an assignment of those variables (a model) that is consistent with the constraints.

On top of that basic idea you can add:

Objective functions — where we’re interested in finding an optimal model (in regards to the objective) rather than any arbitrary model.
Soft constraints — where we’re interested in finding a model that satisfies the most constraints simultaneously.

When the hard constraints have one or more conflicts, it’s also possible to compute Minimum Unsatisfiable Subsets (MUS) — meaning the minimal set of constraints that if removed would make the problem have a solution.

The search for an optimal solution can be done incrementally depending on the algorithm used (producing better and better solutions over time), and we can also ask some systems to produce all possible models for the problem, rather than just one.

That’s nice and all, but why not ML instead?

Machine Learning (ML) and Machine Reasoning (MR) overlap for some problems but you’d normally use them for different purposes.

There are two properties of ML that make it a poor fit for some domains:

ML requires a massive amount of training data to work well. Some domains don’t have such a quantity of data available.
ML is statistical in nature and ultimately a stochastic process. The results aren’t very predictable, hence “hallucinations” and hands with 7 fingers.

ML excels at problems without a clear set of rules and where the solution does not have to be 100% accurate. For example, I have no idea what rules I would give to some system for it to generate a picture of a dog, and it is usually ok if the picture looks more like a drawing or 3D render than a perfectly lit photo of a dog.

MR, on the other hand, is a better choice for well defined problems that require precise solutions. Examples include: electronic design automation (EDA), software verification, test generation, network optimization, routing, resource allocation, scheduling, supply chain management, etc.

One example of what MR can do is Simcenter Studio, which is able to generate all topological variants of a physical system, simulate them, and then rank them, given simple high-level specification of the system’s components, their ports, and any relevant connection constraints.

It was used, for example, to find all 4-speed 2-stage transmission designs that can possibly exist (turns out there are only 12). It proved there are no other solutions in around 8 hours. It took decades for the first 4-speed 2-stage transmission design to be patented after 2-stage transmissions were first introduced.

An ML system trained on pre-existing 3-speed 2-stage transmission designs might be able to accidentally produce a 4-speed 2-stage transmission design, but there’s no guarantee. It certainly cannot prove there are only 12 solutions.

Many package managers use MR tech to handle package dependency resolution. A friend of mine is using MR for planning wedding seats.

Machine Reasoning Technologies

There are many different technologies that fit the umbrella of “Machine Reasoning” or model generators as I’ve defined it above, here I’ll briefly cover the ones I find the most interesting or relevant: Boolean Satisfiability (SAT), Satisfiability Modulo Theories (SMT), Constraint Programming (CP), (Mixed) Integer Programming (MIP) and Answer Set Programming (ASP).

Boolean Satisfiability (SAT)

Boolean Satisfiability (SAT), is sort of the assembly language of model generation. SAT is restricted to boolean variables, and boolean constraints (clauses) in Conjunctive Normal Form (CNF). This may sound limiting at first but then you remember that computers work in 0s and 1s. You can model logic gates in CNF using a Tseytin transformation, and with logic gates you can model arithmetic.

The algorithm at the heart of nearly every SAT solver is called Conflict-Driven Clause Learning (CDCL). CDCL is a backtracking algorithm that records extra conflict clauses every time it hits a dead-end branch in the search space, preventing similar paths from being explored in the future.

CDCL (plus many fancy heuristics and preprocessing tricks) makes SAT solvers absurdly efficient. Modern SAT solvers can handle problems with millions of variables and millions of constraints. They’re a core piece of tech in the EDA domain.

There’s a yearly SAT competition where many different SAT solvers compete to solve as many problems as possible in as little time as possible. The standard input format for SAT solvers is DIMACS. Example SAT solvers include Kissat and CryptoMiniSAT.

Satisfiability Modulo Theories (SMT)

Satisfiability Modulo Theories (SMT), builds on top of SAT by combining a SAT solver with one or more theory solvers. Examples of theories include integer arithmetic, real arithmetic, uninterpreted functions, strings, etc. SMT solvers are an indispensable piece of technology in the domains of software verification and formal proofs.

Most SMT solvers use the CDCL(T) algorithm, but there are other approaches like translating the problem entirely into SAT (only possible for some theories like the theory of bit-vectors or floating point numbers).

Like with SAT there’s a yearly competition where many different SMT solvers compete to solve as many problems as possible in as little time as possible. The standard input format for SMT solvers is SMTLIB2. SMT solvers include: cvc5, Z3, Yices 2, MathSAT 5, Bitwuzla, etc.

Constraint Programming (CP)

Unlike SMT which is a direct descendant of SAT, Constraint Programming (CP) developed in parallel, though these days many CP solvers employ SAT solvers internally and work in a similar manner to SMT solvers.

In CP the focus is on finite-domain integer variables2 and global constraints: powerful high-level constraints that can, through specialized algorithms, effectively and efficiently reduce the domains of the variables involved in the constraint. The most famous global constraint is alldifferent, which constrains every variable in its input array to take a different value.

The naive encoding for alldifferent would require pairwise inequalities between every constrained variable, which is inefficient to solve. There are substantially more efficient ways of decomposing or propagating alldifferent. The global constraint has more information than a set of equivalent local constraints, and because of this can be solved more efficiently. That’s the basic idea of CP.

CP solvers are an indispensable piece of technology in operations research, excelling at solving complex scheduling and routing problems.

There are two main algorithms used these days for solving CP problems: Maintaining Arc-Consistency (MAC) and Lazy Clause Generation (LCG). MAC works by employing specialized algorithms for each global constraint alongside an arc-consistency algorithm like AC-3, whereas LCG works similarly to an SMT-solver, with each global constraint behaving as a sort of theory solver that produces SAT clauses representing reductions in the domain of the variables.

Like SAT and SMT, CP has yearly competitions, one for each of the two standard input formats: MiniZinc and XCSP3. CP solvers include: CP-SAT, Gecode, Choco, etc.

(Mixed) Integer Programming (MIP)

This section is technically encompassing many related technologies.

Linear programming (LP) focuses on optimization problems involving continuous variables and linear inequality constraints on those variables, plus an objective function. Integer linear programming (ILP) is the same but with integer variables. If the solver supports both continuous and integer variables, it’s called Mixed Integer Programming (MIP). If the variables are integer and the constraints non-linear, it’s called Integer Programming (IP). If the variables are mixed and the constraints non-linear, it’s called Mixed-Integer Non-Linear Programming (MINLP). And then there are special cases of non-linear programming like Quadratic Programming (QP) that have dedicated algorithms. Phew… got all that?

(Mixed) Integer Programming (MIP) in particular has the same “computational power” as the other approaches above. They can all handle NP-complete problems.

The focus here is clearly on optimization. If you need to find the best solution to a problem that is mostly continuous plus some discrete variables, this is the tech to go for. Some SMT solvers and most CP solvers can also handle optimization, but MIP solvers tend to be much better at it. So much better in fact that commercial MIP solvers can sell for thousands of dollars.

The main algorithm used by most modern MIP solvers is called Branch & Cut, which combines the Simplex algorithm from linear programming with backtracking search and cutting planes to reduce the search space when the search hits a dead-end.

Unlike SAT, SMT and CP, I’m not aware of any official competition for MIP, but there is a benchmark that compares some open source and commercial solvers. There isn’t a standard input format for MIP, but nl, LP and MPS are the most commonly supported formats. MIP solvers include: CBC, SCIP, HiGHS, Gurobi, CPLEX, etc.

Answer Set Programming (ASP)

One way to think of Answer Set Programming (ASP) is as a middle ground between Datalog and Prolog. Not turing-complete like Prolog, but more expressive than Datalog (Datalog cannot handle NP-complete problems, but ASP can).

Constraints in ASP (unlike the other approaches) are mainly universally quantified rather than existentially quantified. That is:

ancestor(A, B) :- parent(A, B). 
ancestor(A, C) :- parent(A, B), ancestor(B, C).

Is equivalent to the first order logic formula

∀a, b. Parent(a, b) → Ancestor(a, b)
∀a, b, c. Parent(a, b) ∧ Ancestor(b, c) → Ancestor(a, c)

ASP solvers cannot handle universally quantified formulas like that directly, they first need to be grounded (i.e., every variable expanded with every possible value). That’s handled by a separate component called a grounder. ASP software is usually a pair of a solver and a grounder.

Modern ASP solvers use an adaptation of CDCL from SAT as their core algorithm and can often invoke the grounder component incrementally.

Like with SAT, SMT and CP, there is a competition, but it does not happen as often. The standard input formats for ASP solvers are AnsProlog and ASP-Core-2. ASP solvers include: clingo (clasp + gringo) and DLV (wasp + I-DLV)

Concluding Remarks

That was a long but very incomplete look at a field that is dear to my heart. It saddens me to see it relegated to the sidelines while deep-learning steals the spotlight. There is so much more to cover, I didn’t even cover local-search-based approaches.

It’s not like Machine Reasoning and Machine Learning are at odds either, they can be combined! You can, for example, train a ReLU-based neural network on the operational data of some real-world hardware system to create a simulation model, translate that neural-network to a set of linear equations, and then use a MIP Solver to optimize some aspect of that system.

Or you could have an LLM generate constraint satisfaction problems and invoke an existing solver instead of trying to solve arithmetic problems directly. Or combine the texture synthesis capabilities of wave function collapse with the image generation capabilities of diffusion models.

Don’t leave everything to megacorps with massive datasets.

Decision Trees, Support Vector Machines, Inductive Logic Programming, Bayesian Networks, Genetic algorithms… I recommend reading Pedro Domingos’ book “The Master Algorithm” for an overview.

Some CP solvers support real variables using interval arithmetic.

Burning the Midnight Coffee

Discussion about this post