Skip to content

Flatter representation of deeply nested AND / OR Expressions #23264

Description

@alamb

Is your feature request related to a problem or challenge?

DataFusion represents arbitrary binary Operator trees like

(a OR (b OR (c OR (d OR ...)))

Using a binary tree structure called BinaryExpr:

/// Binary expression for [`Expr::BinaryExpr`]
#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct BinaryExpr {
/// Left-hand side of the expression
pub left: Box<Expr>,
/// The comparison operator
pub op: Operator,
/// Right-hand side of the expression
pub right: Box<Expr>,
}

There are also many places in the codebase where recursive algorithms are used to walk over Expr trees,

An example of this is Treenode::apply:

fn apply<'n, F: FnMut(&'n Self) -> Result<TreeNodeRecursion>>(
&'n self,
mut f: F,
) -> Result<TreeNodeRecursion> {
#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
fn apply_impl<'n, N: TreeNode, F: FnMut(&'n N) -> Result<TreeNodeRecursion>>(
node: &'n N,
f: &mut F,
) -> Result<TreeNodeRecursion> {
f(node)?.visit_children(|| node.apply_children(|c| apply_impl(c, f)))
}
apply_impl(self, &mut f)
}

The stack depth is a function of the nesting depth of the expression, which means that

Important

In some cases processing deeply nested expressions can lean to a stack overflow (and thus a process abort)

This shows up in practice from Machine-generated SQL/filters with thousands of OR/AND terms (e.g. IN lists rewritten to OR, or large predicate pushdown

This representation has served us quite well, but we do have several mitigations

  1. recursive crate + recursive_protection feature (segmented stack growth)

The primary mitigation. Recursive TreeNode drivers are annotated #[cfg_attr(feature = "recursive_protection", recursive::recursive)], which uses the recursive crate to grow the stack on the heap when it runs low, instead of overflowing. See the TreeNode walkers (6 sites — apply/transform_down/transform_up/rewrite/etc.):

fn apply<'n, F: FnMut(&'n Self) -> Result<TreeNodeRecursion>>(
&'n self,
mut f: F,
) -> Result<TreeNodeRecursion> {
#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
fn apply_impl<'n, N: TreeNode, F: FnMut(&'n N) -> Result<TreeNodeRecursion>>(
node: &'n N,
f: &mut F,
) -> Result<TreeNodeRecursion> {
f(node)?.visit_children(|| node.apply_children(|c| apply_impl(c, f)))
}
apply_impl(self, &mut f)
}

  1. SQL parser recursion limit (bounded recursion → error, not crash)

The sqlparser crate itself caps nesting depth (default 50) and returns RecursionLimitExceeded rather than overflowing the stack.

  1. Explicit-stack (iterative) SQL→Expr conversion

We also hit this a bunch when converting a sqlparser binary-operator tree into Expr and have a stack based conversion approach (see

) -> Result<Expr> {
enum StackEntry {
SQLExpr(Box<SQLExpr>),
Operator(BinaryOperator),
}
// Virtual stack machine to convert SQLExpr to Expr
// This allows visiting the expr tree in a depth-first manner which
// produces expressions in postfix notations, i.e. `a + b` => `a b +`.
// See https://github.com/apache/datafusion/issues/1444
let mut stack = vec![StackEntry::SQLExpr(Box::new(sql))];
let mut eval_stack = vec![];
while let Some(entry) = stack.pop() {
match entry {
StackEntry::SQLExpr(sql_expr) => {
match *sql_expr {
SQLExpr::BinaryOp { left, op, right } => {
// Note the order that we push the entries to the stack
// is important. We want to visit the left node first.
stack.push(StackEntry::Operator(op));
stack.push(StackEntry::SQLExpr(right));
stack.push(StackEntry::SQLExpr(left));
}
_ => {
let expr = self.sql_expr_to_logical_expr_internal(
*sql_expr,
schema,
planner_context,
)?;
eval_stack.push(expr);
}
}
}
StackEntry::Operator(op) => {
let right = eval_stack.pop().unwrap();
let left = eval_stack.pop().unwrap();
let expr = self.build_logical_expr(op, left, right, schema)?;
eval_stack.push(expr);
}
}
}
assert_eq!(1, eval_stack.len());
let expr = eval_stack.pop().unwrap();
)

  1. Out-of-line match arms in the expression simplifier

Large match arms are extracted into separate functions specifically to shrink per-frame stack usage and avoid overflow during simplification.

Recently people seem to be hitting recursion limits again and trying to put in patches, such as

Describe the solution you'd like

I would like a more fool proof way to avoid stack overflow issues

Describe alternatives you've considered

Option 1: n-ary expressions

The approach taken by DuckDb Mysql and Postgres is to represent AND and OR as n-ary rather than binary (2)

So that would be something like this (could also have an enum for And/Or)

enum ConjunctionType { And, Or }
/// AND/OR of some number of expressions
struct Conjunction {
  conjunction_type: ConjunctionType,
  /// Logically `exprs[0] AND (exprs[1] AND (exprs[2] AND ...`
  exprs : Vec<Expr>.
}

And then

/// extend the Expr enum
enum Expr { 
  ...
  // AND
  Conjunction(Conjunction),
 ...
}

Examples:

DuckDB uses ConjunctionExpression: https://github.com/duckdb/duckdb/blob/b8a06e4a22672e254cd0baa68a3dbed2eb51c56e/src/include/duckdb/parser/expression/conjunction_expression.hpp#L17-L27

Postrgres uses BoolExpr https://github.com/postgres/postgres/blob/REL_17_0/src/include/nodes/primnodes.h#L929-L942

Mysql uses Item_cond_and / Item_cond_or (based on Item_cond base which stores args as a List list;

Cons

This would be a major downstream API change
Also unless we removed Operator::And / Operator::Or we would potentially have multiple ways to express the same logical expression

Keep Binary Expr, have some sort of "balance" operation

Another approach taken by spark is to use Binary expressions: https://github.com/apache/spark/blob/fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L828

buildBalancedPredicate rebuilds a predicate list into a height-balanced tree in

Additional context

Recent PRs

🎯 Umbrella / strategic (open)

🐛 Open stack-overflow bugs (motivating cases)

📜 Closed — the history that motivated the current mitigations

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Fields

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions