Wednesday, June 22, 2011

specifying alignment with arm neon instructions and the iOS toolchain

On ARM Cortex CPUs with the SIMD "neon" extension, you can do a bunch of complex loads and stores in a single instruction, like so:

vld4.u8 {d0,d1,d2,d3}, [r0@128]

where "@128" means 128-bit aligned. But the iPhone (iOS) SDK uses gas, the gnu assembler. gas treats '@' as a comment character, so the gas syntax uses a colon instead. Or at least it should. The version of gas distributed in the latest 4.3 SDK is broken, so you can't specify an alignment at all. You'll get an error like:

']' expected -- `vld4.u8 {d0,d1,d2,d3},[r3:64]

This is bad because aligned transfers are faster.

The ARM assembler is located in '/Developer/Platforms/iPhoneOS.platform/Developer/usr/libexec/gcc/darwin/arm'. This is what needs to be fixed.


According to the gcc mailing list, the problem was fixed in binutils 2.21. A quick diff of 2.20 and 2.21 shows the fix (along with a troublingly large number of other fixes..). Here is the diff:

--- arm-original.c 1970-01-01 09:00:00.000000000 +0900
+++ arm.c 2011-06-22 15:28:34.000000000 +0900
@@ -3509,6 +3509,33 @@
/* Never reached. */
}

+/* Parse a Neon alignment expression. Information is written to
+ inst.operands[i]. We assume the initial ':' has been skipped.
+
+ align .imm = align << 8, .immisalign=1, .preind=0 */
+static parse_operand_result
+parse_neon_alignment (char **str, int i)
+{
+ char *p = *str;
+ expressionS exp;
+
+ my_get_expression (&exp, &p, GE_NO_PREFIX);
+
+ if (exp.X_op != O_constant)
+ {
+ inst.error = _("alignment must be constant");
+ return PARSE_OPERAND_FAIL;
+ }
+
+ inst.operands[i].imm = exp.X_add_number << 8;
+ inst.operands[i].immisalign = 1;
+ /* Alignments are not pre-indexes. */
+ inst.operands[i].preind = 0;
+
+ *str = p;
+ return PARSE_OPERAND_SUCCESS;
+}
+
/* Parse all forms of an ARM address expression. Information is written
to inst.operands[i] and/or inst.reloc.

@@ -3593,20 +3620,13 @@
}
else if (skip_past_char (&p, ':') == SUCCESS)
{
- /* FIXME: '@' should be used here, but it's filtered out by generic
- code before we get to see it here. This may be subject to
- change. */
- expressionS exp;
- my_get_expression (&exp, &p, GE_NO_PREFIX);
- if (exp.X_op != O_constant)
- {
- inst.error = _("alignment must be constant");
- return PARSE_OPERAND_FAIL;
- }
- inst.operands[i].imm = exp.X_add_number << 8;
- inst.operands[i].immisalign = 1;
- /* Alignments are not pre-indexes. */
- inst.operands[i].preind = 0;
+ /* FIXME: '@' should be used here, but it's filtered out by generic
+ code before we get to see it here. This may be subject to
+ change. */
+ parse_operand_result result = parse_neon_alignment (&p, i);
+
+ if (result != PARSE_OPERAND_SUCCESS)
+ return result;
}
else
{
@@ -3672,6 +3692,15 @@
return PARSE_OPERAND_FAIL;
}
}
+ else if (skip_past_char (&p, ':') == SUCCESS)
+ {
+ /* FIXME: '@' should be used here, but it's filtered out by generic code
+ before we get to see it here. This may be subject to change. */
+ parse_operand_result result = parse_neon_alignment (&p, i);
+
+ if (result != PARSE_OPERAND_SUCCESS)
+ return result;
+ }

if (skip_past_char (&p, ']') == FAIL)
{

Now we just need to apply the fix to the Apple toolchain.

Conveniently, Apple has open-sourced cctools, the package containing the assembler. The version is slightly older than the current 4.3 toolchain, but let's just close our eyes and hope for the best.

Download cctools-795. I had to edit the Makefile for libstuff, changing the include path for lto.o like so: '-I/Developer/usr/clang-ide/local/include'. The build fails, but only after building as. Oh, apply the patch first to as/arm.c. Now copy aarm_dir/as to the above directory, and you should be good to go.


1 comment: