Codasip L31 and L11 RISC-V cores for AI/ML support TFLite Micro, adjustments

0

Codasip has announced the L31 and L11 low-power embedded RISC-V processor cores optimized for adapting AI/ML IoT edge applications with power and size constraints.

The company further explains that the new L31/L11 RISC-V cores can run Google’s TensorFlow Lite for microcontrollers (TFLite Micro) and can be optimized for specific applications by Codasip Studio RISC-V design tools. As I understand it, thanks to a full architecture license, this can be done by the customers themselves, as stated by Codasip CTO, Zdeněk Přikryl:

By licensing the CodAL description of a RISC-V core, Codasip customers receive a full architectural license that allows customization of both the ISA and microarchitecture. The new L11/31 cores make it even easier to add features that our customers have been asking for, such as: B. Edge AI, into the smallest embedded processor designs with the lowest power consumption.

The ability to customize cores is important for AI and ML applications because data types, quantization, and performance requirements vary widely from application to application, and off-the-shelf processors may not be optimized for a specific task.

We don’t get many details about the new cores, except that they all come with a 3-stage pipeline, while the Codasip L31/L31F (with FPU) uses the RV32IMC instruction set, offers 32 registers and a parallel multiplier, while the Codasip L11 uses the RV32EMC instruction set, comes with 16 registers and a sequential multiplier. They also replace the earlier Codasip L30(F) and L10 cores, which are no longer recommended for new designs.

Codadip demonstrates the benefits of using TFLite-Micro and customization in a white paper titled “Embedded AI on L-Series Cores – Neural networks powered by custom instructions” (registration required, but you can use a fake email). They used the “MNIST classification of handwritten digits” as an example and compared different implementations in terms of cycle, power and area.

Codasip L31 RISC-V Core Tensorflow LiteThe L31 with the FPU (31F) in the middle is much faster, consumes significantly less, but would make a much larger chip. One solution is to use L31 with quantization of the neural network parameters and the input data supported by TFLite-Micro, with almost the same performance as the hardware FPU solution, even lower power consumption and the same area since the chip is not changed. Switching to integer instead of floating point had a negligible impact on accuracy: 98.91% (fp32) and 98.89% (int8) over a set of 10,000 frames.

So the best compromise is to use L31 with TFLite-Micro, but to further optimize the design they profiled the program with Codasip Studio to find the (C) code and associated instructions that do the most cycles consume.

RISC-V profiling

To optimize vector memory loading and convolutional multiplication and accumulation sequences, they added two custom directives:

  • mac3 to combine multiplication and addition into a single clock cycle (speeds up fourth line above)
  • lb.pi to increment the address immediately after the load instruction. (connect lines 2 and 3)

L31 user-defined RISC-V instructions

The new instructions show up in the profiles, and the whole loop takes a lot fewer cycles. Specifically, this resulted in 10% fewer cycles and 8% less power consumption. The new custom instructions increased the area, but only by 0.8%.

Custom L31 TFLite Micro OptimizationsTFLite Micro support is a new thing for Codasip’s RISC-V microcontrollers, but it’s now been added to all of their cores.

Evaluation of the core can be done on the Digilent Nexys A7 FPGA board running either bare metal code or an RTOS such as FreeRTOS. More details on the L31 and L11 RISC-V core may be available Codasip website and the press release.

Share.

Comments are closed.