Usually, in order to discover inherent parallelism, it’s useful to consider a machine that has an unbounded number of processors, and that is capable of free control overhead and communication. Inherent parallelism as defined above loosely defines an algorithm’s span, and serves as a theoretical lower bound on all real machines.
Span for unary or binary operations are $O(1)$ since there are unbounded number of processors, Span for reductions and broadcasts are cost $O(log(n))$ using a tree of processors.
You can multiply matrices of dimension order $n$ in $O(log(n))$ time if you use $n^3$ processors, each which does constant work, then reducing with tree structure:
Not realistic, but fun!