Also just in general the idea of at least being worst case optimal (polynomial f...

Also just in general the idea of at least being worst case optimal (polynomial faster multi way join-project w.r.t. input rows, assuming fixed query (but those polynomials are given by concepts like the "fractional hypertree width" of a query) but pathologically connected/connecting input rows to the join):

the simplest example is a simple two-column table of a directed graph's edges, e(from, to). Then: Triangle(a, b, c) := e(a, b), e(b, c), e(c, a).

#Triangle <= c * (#e)^1.5, for a small c. (I think c = 3 in this case, but I'm not sure.)

Using binary joins for pathological input will however take O((#e)^2). If you do row-granularity lookup using an auxiliary index (actually 2 in the simplified way, but they can be merged by pushing the post-lookup choice down to index creation) along with indexing e twice (once for each of the two columns), you can do it in O((#e)^1.5):

Create auxiliary indices eaux1 and eaux2: eaux1(from, (#e(from, to))). eaux2(to, (#e(from, to))).

Iterate over e(a, b): Look up eaux1(b, ?countbc) and eaux2(a, ? countca). If countbc < countca, then: expand through point lookup e(b, ?c). Filter those resulting e(?a, b), e(b, ?c) candidates using a point membership query suitable index of e(c, a). else: expand through point lookup e(?c, a). Filter those resulting e(a, ?b), e(?c, a) candidates using a point membership query suitable index of e(b, c).

Done.

As can be seen, the trick is to select between the current options for "next table to join current partial row against" at each nesting level of the generally nested-loop-join structured query execution, using premade indices that reveal the iteration count of the upcoming inner nested loop in O(1) time.