Kevin P Schoedel

540 Mustang Place, Waterloo, Ontario N2K 4B9
(519) 747-3026


Senior Software Engineer, Intel, July 2010 – Present

Intel's Waterloo, Ontario site was established at the end of 2009 from its acquisition of RapidMind, a startup selling a data-parallel JIT programming system.

  • Worked on bringing the Intel Labs Ct project, with some additions from RapidMind, up to product quality as Intel Array Building Blocks.
  • Adapted Intel's internal OpenCL vectorizer for an LLVM-based successor to Array Building Blocks designed around high-level data parallel intrinsics in LLVM IR.
  • Added Cilk Plus elemental functions to Clang, using an IR interface modelled on OpenCL SPIR, in support of a Clang-based front end for ICC.
  • Addressed problems related to present and future use of LLVM in Android, including fixing Renderscript-related bugs, and other issues building the x86 Android tree with LLVM.
  • Planned and prototyped support for Intel Memory Protection Extensions in LLVM. (MPX is a set of new instructions aimed at allowing pointer bounds checking in C/C++ without breaking existing ABIs and data layouts.)

Senior Software Engineer, Sandvine Incorporated, July 2005 – July 2010

Sandvine develops carrier-grade network equipment providing flexible congestion management, network integrity, and operational support.

During most of my tenure at Sandvine, I was primarily or solely responsible for designing, developing, and maintaining software running on, and related to, the load-balancing network processors crucial to the current product architecture.

  • Optimized the network processor's assembly-language software to 40% of its original length to meet the essential hard real-time target imposed by 10GigE wire speed.
  • Enhanced the software to support IP-based load balancing, MPLS and L2TP tunneling protocols, and new network deployment variations, while maintaining the critical-path time constraint.
  • Ported the assembly code to two new hardware platforms.
  • Worked on FreeBSD kernel drivers for the network processor and associated components.
  • Maintained and ultimately redesigned the controlling C++ host application.
  • Wrote a debugging shell that was ultimately adopted as the system command-line interface.

Senior Software Engineer, Archelon Inc., April 1989 – January 2004

Archelon, previously Bit Slice Software, began developing compilers for microcode for high-performance graphics systems, and broadened to support other processors outside the general-purpose mainstream, with recent emphasis on DSPs (digital signal processors) and embedded controllers; clients have ranged from start-ups to the top computer and semiconductor makers.

Except in my earliest years at Archelon, I was normally responsible for all technical aspects of the tasks on which I worked – design, implementation, testing, maintenance, and documentation.

  • Designed, wrote, maintained and enhanced the register allocator for the current compiler, including advances beyond the published state of the art in handling incongruous processors. This project led to Archelon's compiler outperforming GCC on customer benchmarks.
  • Targeted the compiler to several processors: the AOX QT Engine, the Clarkspur CD2458, the CSEM CoolRISC 816, the Oxford A236, and the United Technologies HS1600. These tasks typically included designing the compiler ABI, and hand-coding low-level library routines, as well as retargeting the code generator.
  • Contributed to targeting the AOX QT32, the Oasis Cougar, the OnSpec 90C36, the Oxford A436, the 3DSP Waimea, the Atmel AT76C202, the Clarkspur CD2450, the Dspfactory RCORE, the Inicore iniDSP, the Motorola 56000, the ST WDSP and ST Emerald, the Streamachine SMDSP, and the Zucotto Xpresso.
  • Worked, at various times, with a wide variety of other processors, including the David Sarnoff Research Center's Princeton Engine (a 2048-way SIMD architecture designed for real-time video processing), a 4-way SIMD graphics engine by a major semiconductor manufacturer, a microcoded engine designed for a major computer manufacturer's midrange systems, and the NSC HPC (an early 16-bit microcontroller).
  • Added SIMD conditional locking to the retargetable compiler to support the Oxford A236.
  • Designed and wrote an output rewriting layer for the compiler, to allow it to support arbitrary third-party assemblers.
  • Added COFF symbolic debugging to the C compiler.
  • Designed, wrote, benchmarked, and maintained a standalone instruction scheduler for the Intel i960 family of processors. (The i960 CA was the first superscalar microprocessor.) The scheduler operates at the assembly code level, and can deal equally with hand-written and compiler-generated code.
  • Enhanced the microcode compaction tool to better schedule instructions on processors with delayed branches.
  • Worked on instruction scheduling for the pipelined Intel i860.
  • Wrote a FORTRAN 77 front end, wrote the FORTRAN intrinsic libraries, and wrote diagnostics. Interfaced the front end to three local code generators (for the Trancept/Sun TAAC and successor Jet, for TI bit slice parts, and for the TI TMS34082) and to a third-party back end. Later added support for Sun and DEC extensions. Modified the linker to support FORTRAN common blocks.
  • Modified the linker to support libraries, and wrote an object library manipulation tool. Other additions to the linker include support for arbitrary relocation expressions, and user-defined debug table formats.
  • Added general code relaxing to the assembler, with user-level directives for conditional assembly by address range. Other additions to the assembler included debugging tables for hand-written assembly code; C-style expressions; and user-defined listing formats.
  • Worked on a custom assembler, with continually evolving specifications, for a major computer manufacturer's VLIW research project.
  • Designed and wrote a parameterized simulator in support of testing an earlier retargetable code generator.
  • Wrote an m4 clone.
  • Built a floating-point compiler support library, derived from the contemporary Cephes library.
  • Ported the GNU assembler, linker, and object tools to 16-bit MS-DOS.

Intern, Control Data Corporation, Fall 1985 and Summer 1986

At CDC, I primarily performed acceptance testing for VX/VE, a UNIX environment for the Cyber 180, and the associated C compiler. I wrote and ran tests and prepared bug reports for the vendor; I also filtered or handled customers' issues.


Bachelor of Mathematics , University of Waterloo, 1989
in Joint Honours Pure Mathematics and Computer Science.
Graduated on the Dean's Honours List.

Related Hobbies

I have long-standing interests in programming languages and systems, and in computer architecture.

I own a variety of interesting hardware, including a DEC PDP11/45, a Data General Nova 2, and a Xerox 8010, along with various other more common systems.

I am gradually learning more electronics, in part to repair and maintain the older machines, and hold an amateur radio license.

My other (less closely related) interests include typography and type design, photography, and epistemology.