Chapel Arrays: How To Handle Distribution-Generic Formals?
Hey everyone! Let's dive into a tricky but important aspect of Chapel: how we represent distribution-generic array formals. Currently, array type expressions in formals are distribution-generic, which is super cool and flexible. But, there are some edge cases where this behavior gets a bit wonky, particularly with out formals and compiler-generated initializers.
Distribution-Generic Array Formals in Chapel
Currently in Chapel, array type expressions in formals are distribution-generic. This means you can write code that works seamlessly with different array distributions without explicit coercions. This feature enhances code reusability and simplifies working with distributed data structures. Let's illustrate this with an example. Consider the following Chapel code snippet:
use BlockDist;
var A: [1..10] int;
var B = blockDist.createArray(1..10, int);
proc foo(arr: [] int) {
compilerWarning(arr.type : string);
}
foo(A);
foo(B);
In this example, the procedure foo accepts an array arr of type [] int. When we call foo with both a default-rectangular array A and a block-distributed array B, Chapel doesn't coerce B to a default-rectangular array. Instead, it creates two instantiations of foo: one for the default-rectangular array and another for the block-distributed array. The compiler output confirms this:
distgen-1.chpl:1: In module 'distgen-1':
distgen-1.chpl:10: warning: [domain(1,int(64),one)] int(64)
distgen-1.chpl:11: warning: [BlockDom(1,int(64),one,unmanaged DefaultDist)] int(64)
This behavior remains consistent even if you provide a fully concrete-looking type expression for the array formal, like [1..10] int. The procedure will still accept arrays with different distributions.
proc foo(arr: [1..10] int) { /* I behave the same */ }
Generally, this implicit genericity works well, promoting a sense of uniformity across different array types. However, there are a couple of scenarios where this implicit behavior can lead to unexpected results, specifically when using out formals and compiler-generated initializers.
The out Formal Conundrum
out formals in Chapel are intended to allow a procedure to modify the actual argument passed to it. In most cases, array expressions used with out formals behave as expected, maintaining distribution genericity. However, there are situations where the type expression provided for the out formal seems to dictate a concrete, non-distributed array type. Let's break this down with examples.
Consider the following code:
use BlockDist;
var A;
proc foo(out arr: [] int) {
arr = blockDist.createArray(1..10, int);
}
foo(A);
compilerWarning(A.type:string, 0); // [BlockDist...] int
Here, the out formal arr in procedure foo accepts an array without a specified domain. When we assign a block-distributed array to arr, the variable A is successfully initialized as a block-distributed array, as confirmed by the compiler warning. However, if we specify a domain expression in the out formal, the behavior changes:
use BlockDist;
var A;
proc foo(out arr: [1..10] int) {
arr = blockDist.createArray(1..10, int);
}
foo(A);
compilerWarning(A.type:string, 0); // [domain(...)] int, aka default-rectangular
In this case, even though we are assigning a block-distributed array to arr, the variable A ends up being a default-rectangular array. This suggests that the domain expression [1..10] int is forcing a conversion to a concrete, non-distributed array type. This inconsistency can be confusing and lead to unexpected behavior.
To make things even more interesting, consider this scenario where out and non-out formals are mixed:
proc bar(out A: [1..10] int, B: [1..10] int) { A = B; }
var A;
bar(A, blockDist.createArray(1..10, int)) {}
In this example, both A and B have identical type expressions ([1..10] int). However, B happily accepts a block-distributed array, while A remains a default-rectangular array. The assignment A = B results in a many-to-one copy, effectively converting the block-distributed array to a default-rectangular one. Bradcray suggests that this behavior is aligned with how return type expressions are treated, where type expressions are considered concrete. Because out formals conceptually represent another way for a function to return data, they exhibit similar behavior.
Compiler-Generated Initializers and Array Fields
Another area where distribution-generic array formals can cause unexpected behavior is when dealing with records that have array fields. Consider the following record definition:
record R {
var A: [1..10] int;
}
At first glance, this seems straightforward. R is a record with a field A that is a default-rectangular array. However, the compiler-generated initializer for R can lead to issues when you try to initialize A with a block-distributed array.
The compiler-generated initializer notionally looks like this:
proc R.init(A: [1..10] int) { this.A = A; } /* compiler-generated */
Even though the A formal in the initializer has a seemingly generic type expression, it is not treated as distribution-generic. As a result, you'll encounter a compiler error if you attempt to initialize an R instance with a block-distributed array:
blockarraytorecordfield.chpl:1: In module 'blockarraytorecordfield':
blockarraytorecordfield.chpl:10: error: unresolved call 'R.init([BlockDom(1,int(64),one,unmanaged DefaultDist)] int(64))'
blockarraytorecordfield.chpl:5: note: this candidate did not match: init(this: R, A)
blockarraytorecordfield.chpl:10: note: because actual argument #1 with type '[BlockDom(1,int(64),one,unmanaged DefaultDist)] int(64)'
blockarraytorecordfield.chpl:7: note: is passed to formal 'in A: [domain(1,int(64),one)] int(64)'
This behavior might seem reasonable, as it could be interpreted as the user intentionally forcing a conversion as part of the initialization process. However, if the user explicitly defines the initializer, the behavior changes:
proc R.init(A: [1..10] int) { this.A = A; } /* user-written */
Now, the A formal is treated as distribution-generic. This allows you to initialize this.A with a block-distributed array, resulting in a many-to-one copy.
Explicit Genericity: A Path Forward?
Given these inconsistencies, it's worth considering whether array formals with generic management should be explicitly marked as such. This aligns with the recent trend towards explicit genericity in Chapel, as seen with the use of '?' for generic types in formals and field type expressions. If we were to adopt this approach for arrays, it would require a breaking change and should be carefully considered for a future edition or major release.
One potential syntax could involve an explicit dmapped ? annotation in the formal type expression:
proc foo(A: [{1..1} dmapped ?] int) {} /* accepts generics */
proc foo(A: [{1..1}] int) {} /* accepts default-rectangular arrays */
This syntax would clearly distinguish between formals that accept distribution-generic arrays and those that expect default-rectangular arrays.
However, a counter-argument exists: making array expressions implicitly generic contributes to Chapel's ease of use and promotes the idea of distributed arrays as first-class citizens. Forcing explicit genericity might diminish the advantages of a unified array type and make the language feel more complex. Finding the right balance between explicitness and ease of use is crucial for Chapel's continued evolution.
Conclusion
The current behavior of distribution-generic array formals in Chapel offers flexibility but can lead to inconsistencies, especially with out formals and compiler-generated initializers. While implicit genericity simplifies code in many cases, the lack of explicit control can cause confusion. The Chapel community needs to weigh the pros and cons of explicit genericity for array formals to determine the best path forward for the language.
What do you guys think? Should we make array genericity more explicit, or stick with the current implicit approach? Let's discuss!