A Boolean expression to polynomial generator
Note:
- I’ve been looking for a python implementation that converts a Boolean expression stored as a string, to a polynomial, also returned as a string. The hardest part of this conversion was writing a parser for arbitrarily nested expressions. Here is my implementation. Doesn’t handle
xor
yet! - This document uses the noweb syntax. Please tangle the source org file to generate parser.py
Outline
Objective: Given a nested Boolean expression, parse and construct a nested syntax tree, and convert the tree into a polynomial
Approach:
- Parse expression and represent as tree
- Use tree as input to construct the polynomial
Challenges:
- What format to use to represent the tree: Nested dictionary, but this might become clumsy very quickly. Another option is definine node classes that can point to another node object as an attribute.
- The two level problem of constructing a nested expression, as well as storing the LEFT and RIGHT arguments to a Boolean operator.
Solution:
- Define a class
Node
which stores the operator, the left and the right arguments - The left and right arguments will be other
Node
objects if they are not simple strings - Two special cases:
not
only has a right argument. The left argument in this case is an empty string.- A Boolean rule with a single input node assigns the node as the ‘operator’. In this case, the
Node
object will have empty strings as left and right arguments, and the input as the operator.
- The
constructPolynomial()
function handles the conversion to the polynomial.
import numpy as np
import pandas as pd
import sys
Get delimiter positions of nested expressions
Operate on a string s
-
Split on whitespace, get positions of separators:
tokenlist = s.split() separatordict = {'open':[],'close':[]} for i, t in enumerate(s): if t == '(': separatordict['open'].append(i) elif t == ')': separatordict['close'].append(i)
-
Get paren positions
separatordict = {'open':[],'close':[]} for i, t in enumerate(tokenlist): if t == '(': separatordict['open'].append(i) elif t == ')': separatordict['close'].append(i)
-
Check validity: Length of open should be equal to close, otherwise imbalanced
if len(separatordict['open']) != len(separatordict['close']): print(separatordict, tokenlist) print('Imbalanced expession!') sys.exit()
-
Get tuples for the ranges spanning separators. The logic here is that you find the first closing position
c
, and then find the largest opening positiono
less thanc
. The ordered pair (o
,c
) corresponds to one nested expression. Pop these, and repeatexplocations = [] separatordict['close'].reverse() while len(separatordict['open']) > 0: c = separatordict['close'].pop() o = max([l for l in separatordict['open'] if l < c]) separatordict['open'].remove(o) explocations.append((o,c))
-
Print all subexpressions
for o, c in explocations: print(o,c,tokenlist[o:c+1])
Simplify subexpressions in placed
How I would do it manually:
- First clean up expression so everything is space separated, then split the expression on the whitespace
-
Accomplish this by add a pre- and post- space for every delimiter.
s = s.replace('(', ' ( ') s = s.replace(')', ' ) ') # remove trailing whitespace s = s.strip() tokenlist = [t for t in s.split(' ') if t != '']
-
- Read the expression from left to right to create a map of the nesting levels.
-
Explanation:
- Get delimiters
- If the outermost delimiters span the token list, remove them, test again
- return tokenlist
delimiters = self.getDelimiterPositions(tokenlist) if len(delimiters) > 0: for o, c in delimiters: if o == 0: break if o == 0 and len(tokenlist) == c + 1 + o: tokenlist = list(tokenlist[1:-1]) delimiters = self.getDelimiterPositions(tokenlist) # Call itself again tokenlist = self.removeDelimiters(list(tokenlist)) return(tokenlist)
-
Get to the first non-nested operator
if len(delimiters) > 0: for o, c in delimiters: if o == 0: break if o == 0: tokenind = c + 1
-
Format the left and right arguments
argumentdict = {} for k, v in zip(['left','right'],[left,right]): if type(v) is str: argumentdict[k] = v argumentdict[k+'str'] = '' elif type(v) is list and len(v) == 1: argumentdict[k] = v[0] argumentdict[k+'str'] = '' else: argumentdict[k+'str'] = '(' + ' '.join(v) + ')' argumentdict[k] = self.createBoolTree(v)
-
Ignore the nesting problem.Identify the location of the first operator, create a left and right argument, repeat this step on the right hand argumentif tokenlist is None: tokenlist = self.tokenlist tokenlist = self.removeDelimiters(tokenlist) delimiters = self.getDelimiterPositions(tokenlist) tokenind = 0 while tokenind < len(tokenlist): print(tokenlist, tokenind) if len(delimiters) > 0: for o, c in delimiters: if o == 0: break if o == 0: tokenind = c + 1 if tokenlist[tokenind] in self.operators: break else: tokenind += 1 if tokenind == len(tokenlist): return Node('', tokenlist[-1], '') t = tokenlist[tokenind] if t == 'not': left = '' right = tokenlist[tokenind+1:] else: left = tokenlist[:tokenind] right = tokenlist[tokenind+1:] argumentdict = {} for k, v in zip(['left','right'],[left,right]): if type(v) is str: argumentdict[k] = v argumentdict[k+'str'] = '' elif type(v) is list and len(v) == 1: argumentdict[k] = v[0] argumentdict[k+'str'] = '' else: argumentdict[k+'str'] = '(' + ' '.join(v) + ')' argumentdict[k] = self.createBoolTree(v) return Node(argumentdict['left'], t, argumentdict['right'], leftstr=argumentdict['leftstr'], rightstr=argumentdict['rightstr'])
-
-
Utility to get number of parentheses
parencount = 0 for t in tokenlist: if t == '(': parencount += 1 if t == ')': parencount -= 1 return parencount
Putting it all together
import numpy as np
import pandas as pd
import sys
class Node():
def __init__(self, left, op, right, leftstr='', rightstr=''):
self.left = left
self.right = right
self.op = op
self.rightstr = rightstr
self.leftstr = leftstr
class BoolParser():
def __init__(self, expr):
self.expr = expr
self.operators = ['and', 'or', 'not']
self.tokenlist = self.preprocessExpression()
def preprocessExpression(self):
s = self.expr
s = s.replace('(', ' ( ')
s = s.replace(')', ' ) ')
# remove trailing whitespace
s = s.strip()
tokenlist = [t for t in s.split(' ') if t != '']
return tokenlist
def createBoolTree(self, tokenlist=None):
if tokenlist is None:
tokenlist = self.tokenlist
tokenlist = self.removeDelimiters(tokenlist)
delimiters = self.getDelimiterPositions(tokenlist)
tokenind = 0
while tokenind < len(tokenlist):
print(tokenlist, tokenind)
if len(delimiters) > 0:
for o, c in delimiters:
if o == 0:
break
if o == 0:
tokenind = c + 1
if tokenlist[tokenind] in self.operators:
break
else:
tokenind += 1
if tokenind == len(tokenlist):
return Node('', tokenlist[-1], '')
t = tokenlist[tokenind]
if t == 'not':
left = ''
right = tokenlist[tokenind+1:]
else:
left = tokenlist[:tokenind]
right = tokenlist[tokenind+1:]
argumentdict = {}
for k, v in zip(['left','right'],[left,right]):
if type(v) is str:
argumentdict[k] = v
argumentdict[k+'str'] = ''
elif type(v) is list and len(v) == 1:
argumentdict[k] = v[0]
argumentdict[k+'str'] = ''
else:
argumentdict[k+'str'] = '(' + ' '.join(v) + ')'
argumentdict[k] = self.createBoolTree(v)
return Node(argumentdict['left'], t, argumentdict['right'],
leftstr=argumentdict['leftstr'],
rightstr=argumentdict['rightstr'])
def getParenCount(self, tokenlist):
parencount = 0
for t in tokenlist:
if t == '(':
parencount += 1
if t == ')':
parencount -= 1
return parencount
def removeDelimiters(self, tokenlist):
delimiters = self.getDelimiterPositions(tokenlist)
if len(delimiters) > 0:
for o, c in delimiters:
if o == 0:
break
if o == 0 and len(tokenlist) == c + 1 + o:
tokenlist = list(tokenlist[1:-1])
delimiters = self.getDelimiterPositions(tokenlist)
# Call itself again
tokenlist = self.removeDelimiters(list(tokenlist))
return(tokenlist)
def printBoolTree(self, currnode):
l = currnode.left
r = currnode.right
if type(currnode.left) is not str:
l = currnode.leftstr
self.printBoolTree(currnode.left)
if type(currnode.right) is not str:
r = currnode.rightstr
self.printBoolTree(currnode.right)
print(l, currnode.op, r)
def constructPolynomial(self, currnode):
l = currnode.left
r = currnode.right
if type(currnode.left) is not str:
l = self.constructPolynomial(currnode.left)
if type(currnode.right) is not str:
r= self.constructPolynomial(currnode.right)
expr = ''
if currnode.op == 'or':
expr = '(1. - (1. - ' +l + ')*(1. - ' + r + '))'
elif currnode.op == 'and':
expr = '(' + l +'*' + r + ')'
elif currnode.op == 'not':
expr = '(1. - ' + r + ')'
else:
expr = currnode.op
return expr
def getDelimiterPositions(self, tokenlist):
separatordict = {'open':[],'close':[]}
for i, t in enumerate(tokenlist):
if t == '(':
separatordict['open'].append(i)
elif t == ')':
separatordict['close'].append(i)
if len(separatordict['open']) != len(separatordict['close']):
print(separatordict, tokenlist)
print('Imbalanced expession!')
sys.exit()
explocations = []
separatordict['close'].reverse()
while len(separatordict['open']) > 0:
c = separatordict['close'].pop()
o = max([l for l in separatordict['open'] if l < c])
separatordict['open'].remove(o)
explocations.append((o,c))
return explocations
testlist = [
# 'a or b or c',
# 'not a',
# 'a and not b',
# '(a or b)',
# '(a or b) or (c or d)',
# '((a or b))'
# 'UGR and not (NR5A1 or WNT4)',
# '(WNT4 and CTNNB1) and not (DMRT1 or SOX9)'
'not (g7)',
'( g1 )',
'( g2 )',
'( g3 )',
'( g4 )',
'( g5 )',
'( g6 or g7)',
]
for t in testlist:
print('testing:', t)
parser = BoolParser(t)
tree = parser.createBoolTree()
#parser.printBoolTree(tree)
print(parser.constructPolynomial(tree))
print('------------')
Test
testlist = [
# 'a or b or c',
# 'not a',
# 'a and not b',
# '(a or b)',
# '(a or b) or (c or d)',
# '((a or b))'
# 'UGR and not (NR5A1 or WNT4)',
# '(WNT4 and CTNNB1) and not (DMRT1 or SOX9)'
'not (g7)',
'( g1 )',
'( g2 )',
'( g3 )',
'( g4 )',
'( g5 )',
'( g6 or g7)',
]
for t in testlist:
print('testing:', t)
parser = BoolParser(t)
tree = parser.createBoolTree()
#parser.printBoolTree(tree)
print(parser.constructPolynomial(tree))
print('------------')