<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 2.0.26">
<title>ShareZeroVec</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
<link rel="stylesheet" href="./asciidoctor.css">
<link rel="stylesheet" href="./mlton.css">

</head>
<body class="article">
<div id="mlton-header">
<div id="mlton-header-text">
<h2>
<a href="./Home">
MLton
20241230+git20251029+dfsg-5
</a>
</h2>
</div>
</div>
<div id="header">
<h1>ShareZeroVec</h1>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p><a href="#">ShareZeroVec</a> is an optimization pass for the <a href="SSA">SSA</a>
<a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_description">Description</h2>
<div class="sectionbody">
<div class="paragraph">
<p>An SSA optimization to share zero-length vectors.</p>
</div>
<div class="paragraph">
<p>From <a href="https://github.com/MLton/mlton/commit/be8c5f576"><code>be8c5f576</code></a>, which replaced the use of the
<code>Array_array0Const</code> primitive in the Basis Library implementation with a
(nullary) <code>Vector_vector</code> primitive:</p>
</div>
<div class="quoteblock">
<blockquote>
<div class="paragraph">
<p>The original motivation for the <code>Array_array0Const</code> primitive was to share the
heap space required for zero-length vectors among all vectors (of a given type).
It was claimed that this optimization is important, e.g., in a self-compile,
where vectors are used for lots of syntax tree elements and many of those
vectors are empty. See:
<a href="http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html" class="bare">http://www.mlton.org/pipermail/mlton-devel/2002-February/021523.html</a></p>
</div>
<div class="paragraph">
<p>Curiously, the full effect of this optimization has been missing for quite some
time (perhaps since the port of <a href="ConstantPropagation">ConstantPropagation</a> to the SSA IL).  While
<a href="ConstantPropagation">ConstantPropagation</a> has "globalized" the nullary application of the
<code>Array_array0Const</code> primitive, it also simultaneously transformed it to an
application of the <code>Array_uninit</code> (previously, the <code>Array_array</code>) primitive to
the zero constant.  The hash-consing of globals, meant to create exactly one
global for each distinct constant, treats <code>Array_uninit</code> primitives as unequal
(appropriately, since <code>Array_uninit</code> allocates an array with identity (though
the identity may be supressed by a subsequent <code>Array_toVector</code>)), hence each
distinct <code>Array_array0Const</code> primitive in the program remained as distinct
globals.  The limited amount of inlining prior to <a href="ConstantPropagation">ConstantPropagation</a> meant
that there were typically fewer than a dozen "copies" of the same empty vector
in a program for a given type.</p>
</div>
<div class="paragraph">
<p>As a "functional" primitive, a nullary <code>Vector_vector</code> is globalized by
ClosureConvert, but is further recognized by ConstantPropagation and hash-consed
into a unique instance for each type.</p>
</div>
</blockquote>
</div>
<div class="paragraph">
<p>However, a single, shared, global <code>Vector_vector ()</code> inhibits the
coercion-based optimizations of <code>Useless</code>.  For example, consider the
following program:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="rouge highlight"><code data-lang="sml">    val n = valOf (Int.fromString (hd (CommandLine.arguments ())))

    val v1 = Vector.tabulate (n, fn i =&gt;
                              let val w = Word16.fromInt i
                              in (w - 0wx1, w, w + 0wx1 + w)
                              end)
    val v2 = Vector.map (fn (w1, w2, w3) =&gt; (w1, 0wx2 * w2, 0wx3 * w3)) v1
    val v3 = VectorSlice.vector (VectorSlice.slice (v1, 1, SOME (n - 2)))
    val ans1 = Vector.foldl (fn ((w1,w2,w3),w) =&gt; w + w1 + w2 + w3) 0wx0 v1
    val ans2 = Vector.foldl (fn ((_,w2,_),w) =&gt; w + w2) 0wx0 v2
    val ans3 = Vector.foldl (fn ((_,w2,_),w) =&gt; w + w2) 0wx0 v3

    val _ = print (concat ["ans1 = ", Word16.toString ans1, "  ",
                           "ans2 = ", Word16.toString ans2, "  ",
                           "ans3 = ", Word16.toString ans3, "\n"])</code></pre>
</div>
</div>
<div class="paragraph">
<p>We would like <code>v2</code> and <code>v3</code> to be optimized from
<code>(word16 * word16 * word16) vector</code> to <code>word16 vector</code> because only
the 2nd component of the elements is needed to compute the answer.</p>
</div>
<div class="paragraph">
<p>With <code>Array_array0Const</code>, each distinct occurrence of
<code>Array_array0Constword16 * word16 * word16</code> arising from
polyvariance and inlining remained a distinct
<code>Array_uninitword16 * word16 * word16 (0x0)</code> global, which
resulted in distinct occurrences for the
<code>val v1 = Vector.tabulate &#8230;&#8203;</code> and for the
<code>val v2 = Vector.map &#8230;&#8203;</code>. The latter could be optimized to
<code>Array_uninit(word16) (0x0)</code> by <code>Useless</code>, because its result only
flows to places requiring the 2nd component of the elements.</p>
</div>
<div class="paragraph">
<p>With <code>Vector_vector ()</code>, the distinct occurrences of
<code>Vector_vectorword16 * word16 * word16 ()</code> arising from
polyvariance are globalized during <code>ClosureConvert</code>, those global
references may be further duplicated by inlining, but the distinct
occurrences of <code>Vector_vectorword16 * word16 * word16 ()</code> are
merged to a single occurrence.  Because this result flows to places
requiring all three components of the elements, it remains
<code>Vector_vectorword16 * word16 * word16 ()</code> after
<code>Useless</code>. Furthermore, because one cannot (in constant time) coerce a
<code>(word16 * word16 * word16) vector</code> to a <code>word16 vector</code>, the <code>v2</code>
value remains of type <code>(word16 * word16 * word16) vector</code>.</p>
</div>
<div class="paragraph">
<p>One option would be to drop the 0-element vector "optimization"
entirely.  This costs some space (no sharing of empty vectors) and
some time (allocation and garbage collection of empty vectors).</p>
</div>
<div class="paragraph">
<p>Another option would be to reinstate the <code>Array_array0Const</code> primitive
and associated <code>ConstantPropagation</code> treatment.  But, the semantics
and purpose of <code>Array_array0Const</code> was poorly understood, resulting in
this break.</p>
</div>
<div class="paragraph">
<p>The <a href="#">ShareZeroVec</a> pass pursues a different approach: perform the 0-element
vector "optimization" as a separate optimization, after
<code>ConstantPropagation</code> and <code>Useless</code>.  A trivial static analysis is
used to match <code>val v: t vector = Array_toVector(t) (a)</code> with
corresponding <code>val a: array = Array_uninit(t) (l)</code> and the later are
expanded to
<code>val a: t array = if 0 = l then zeroArr_[t] else Array_uninit(t) (l)</code>
with a single global <code>val zeroArr_[t] = Array_uninit(t) (0)</code> created
for each distinct type (after coercion-based optimizations).</p>
</div>
<div class="paragraph">
<p>One disadvantage of this approach, compared to the <code>Vector_vector(t) ()</code>
approach, is that <code>Array_toVector</code> is applied each time a vector
is created, even if it is being applied to the <code>zeroArr_[t]</code>
zero-length array.  (Although, this was the behavior of the
<code>Array_array0Const</code> approach.)  This updates the object header each
time, whereas the <code>Vector_vector(t) ()</code> approach would have updated
the object header once, when the global was created, and the
<code>zeroVec_[t]</code> global and the <code>Array_toVector</code> result would flow to the
join point.</p>
</div>
<div class="paragraph">
<p>It would be possible to properly share zero-length vectors, but doing
so is a more sophisticated analysis and transformation, because there
can be arbitrary code between the
<code>val a: t array = Array_uninit(t) (l)</code> and the corresponding
<code>val v: v vector = Array_toVector(t) (a)</code>, although, in practice,
nothing happens when a zero-length vector is created.  It may be best
to pursue a more general "array to vector" optimization that
transforms creations of static-length vectors (e.g., all the
<code>Vector.new&lt;N&gt;</code> functions) into <code>Vector_vector</code> primitives (some of
which could be globalized).</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_implementation">Implementation</h2>
<div class="sectionbody">
<div class="ulist">
<ul>
<li>
<p><a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/share-zero-vec.fun"><code>share-zero-vec.fun</code></a></p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_details_and_notes">Details and Notes</h2>
<div class="sectionbody">
<div class="paragraph">
<p></p>
</div>
</div>
</div>
</div>
</body>
</html>