scalar_refs_to_array_ref: A stupid Perl trick

In Perl, we have some funny (and often contraindicated) features that are just useful enough not to excise them from the language. One of them is that the sort and grep operators (which reorder and filter lists, respectively) return aliases, rather than copies, of the original array elements whenever possible. Because of this, the original array can be modified through the sorted/filtered result.

my @a = qw/w o r d/;
say sort @a; # 'dorw'
 
# Note that perl sort is not in-place.
say @a; # 'word'
 
# Change index 3 of the sort result, which is 'w', to a 'z'
(sub { $_[3] = 'z' })->(sort @a);
 
# The change is reflected in the original
say @a; # 'zord'

Getting this functionality into a non-builtin sub is not straightforward—I haven’t determined any way to return an array directly that retains its linkage to the original. However, it’s pretty easy to do something almost as good (or possibly better): Return an array reference that can do the same thing. The trick to this is that @_, the parameters array for a function, consists of aliases to its parameters, but a reference to it can be taken and returned.

sub parameter_alias_array_ref { \@_ }
 
sub dumb_alternative_sort {
	parameter_alias_array_ref(sort @_);
}
 
my @a = qw/w o r d/;
my $s = dumb_alternative_sort(@a);
say @a; # 'word'
say @$s; # 'dorw'
$s->[3] = 'z';
say @a; # 'zord'
say @$s; # 'dorz'

One place this might be useful is in tandem with the so-called “Schwartzian transform”, a Perl idiom for sorting a list based on some projection of the elements rather than the elements themselves. The premise is that, for some function foo(), we would like the result of

sort { foo($a) cmp foo($b) } @array

but, foo() being somewhat expensive, we would rather compute its value for each element only once rather than the multiple times that would be done this way. So, each row is mapped into a 2-tuple containing the value itself and its projection. The sort is performed ordering by the latter value, then the projection is stripped, leaving only the original value. That looks like the following:

# Note that the order is the reverse of what is described above
map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, foo($_)] } @array
# Intuitive order using temp variables
# Append projection
my @tmp_appended = map { [$_, foo($_)] } @array;
# Sort by projection
my @tmp_sorted = sort { $a->[1] cmp $b->[1] } @tmp_appended;
# Strip projection
my @result = map { $_->[0] } @tmp_sorted;

Note, however, that this result is a copy, not an alias to original elements.

One workaround would be to have the transform return an array of scalar refs to the original array elements. That would look like this:

my @refs =
	map { $_->[0] }
	sort { $a->[1] cmp $b->[1] }
	map { [\$_, foo($_)] }
	@array;
 
# Assign to whatever the 4th element of the result was
${$refs[3]} = 'idk';
 
# Get the results as a plain array copy
my @result = map { ${$_} } @refs;

Ugly and unnatural. But what if we had some way to transform an array of scalar refs into an array aliased to those scalars?

my $ref_result = scalar_refs_to_array_ref(
	map { $_->[0] }
	sort { $a->[1] cmp $b->[1] }
	map { [\$_, foo($_)] }
	@array);
 
# Assign to whatever the 4th element of the result was
$ref_result->[3] = 'idk';
 
# Get the results as a plain array copy
my @result = @$ref_result;
 
# OR, just use @$ref_result directly!

The implementation follows. The fact that @_ aliases the parameters is heavily abused here. The parameters are aliased a few at a time because the dereferenced scalars must be passed directly and explicitly; the obvious shortcut (using map) removes the magic we’re trying to exploit. An alternative would be to construct an explicit list in the form of a string and use eval to alias all of the elements at once. This isn’t an altogether bad idea, but it’s avoided here.

sub parameter_alias_array_ref { \@_ }
 
sub scalar_refs_to_array_ref {
	my $result = parameter_alias_array_ref();
 
	my $count = 0;
 
	# The seemingly straightforward way to do this would be map { $$_ } @_, but
	# that doesn't preserve the aliasing magic that we need.
 
	while(@_) {
		my @part = splice(@_, 0, 16);
		$count += @part;
		my $pa = parameter_alias_array_ref(
			${$part[0]}, ${$part[1]}, ${$part[2]}, ${$part[3]},
			${$part[4]}, ${$part[5]}, ${$part[6]}, ${$part[7]},
			${$part[8]}, ${$part[9]}, ${$part[10]}, ${$part[11]},
			${$part[12]}, ${$part[13]}, ${$part[14]}, ${$part[15]},
			);
		$result = parameter_alias_array_ref(@$result, @$pa);
	}
 
	# Trim excess introduced when the length is not a multiple of the part
	# length above.
	splice(@$result, $count);
 
	$result;
}

EDIT: You might have noticed that up to 15 undefined array elements (specifically, the remaining undefined elements when there are fewer than 16 in the last part) are dereferenced as scalars when the last part of the input is processed. Yet the code still works perfectly. Why?

It’s a semi-obscure bit involving what’s known in Perl as autovivification: If an undefined scalar variable (say, $somevar) is dereferenced as if it were a scalar ref ($$somevar), an anonymous scalar is automatically created and a reference to it is assigned to the variable (something like { my $tmp = undef; $somevar = \$tmp }), and only then is it dereferenced.

Note that this only works if what is dereferenced is a variable; something like ${(undef)} always breaks. In our case, the variable would be $part[n] for any n greater than or equal to the size of the last part; an array element counts as a scalar variable.

Unless you find that the autovivification of up to 15 scalars is consequential performance-wise (it probably isn’t), the code can be left as is. Otherwise, a workaround would be straightforward.

Comments are closed.