Mono-live: July 2022

When people started having modern PCs, they wanted to know how to compile smaller codes quickly. If you have better code optimization, it can decrease the operational cost of big data center applications. You should know that its size is the most dependable factor to phone and embedded systems or software. Ensure that the compiled binary needs to fit in tight code size budgets. You can find headroom squeezed heavily with complicated heuristics, impeding maintenance and improvements.

What is MLGO?

MLGO is a machine learning framework for computer optimization.

Know More About MLGO:

According to a recent research, ML offers more chances for compiler optimization, and it helps to exchange complicated heuristics with ML policies. However, as a compiler, adopting machine learning is a challenge.

It is why MLGO, a Machine Learning Guided Compiler Optimizations Framework, is here. This one is the first industrial-grade general framework used to integrate ML techniques in LLVM systematically.

In this case, you need to know that LLVM is an open-source industrial compiler infrastructure. You can use it to build mission-critical, high-performance software. Besides, it uses RL to train neural networks. Thus, you can make decisions that can exchange heuristics in LLVM. Reinforcement learning is the full form of RL. You can find here two MLGO optimizations for LLVM. The first one decreases code size with inlining. Besides, the second one helps to improve code performance with register allocation (regalloc). You can get both available in the LLVM repository.

How Does MLGO Work?

Inlining can decrease the code size. It makes decisions that remove the redundant code. We have given here an example. The caller function foo() calls the callee function bar(). It is known as baz().

Inlining both callsites will return a simple foo() function to minimize the code size. You can see many codes calling each other. Thus, these comprise a call graph.

The compiler traverses the graph during the inlining phase. Then, it decides whether it should inline a caller-callee pair or not. This one is a sequential process because earlier inlining decisions will change the call graph. As a result, it will affect the later decisions and the final outcome. The call graph foo() → bar() → baz() requires a "yes" decision on both edges. It helps to decrease the code size.

A heuristic decided inline / no-inline before MLGO. But the more time passes, it becomes hard to improve. The framework substitutes the heuristic with an ML model.

Hence, the compiler seeks advice from a neural network during the traversal of the call graph. It takes the advice to know if it should inline a caller-callee pair by feeding in relevant features from the graph. After that, it will execute them sequentially until the entire call graph is traversed.

The framework trains the decision network with RL. In this case, it uses policy gradient and evolution strategies algorithms . Thus, it can gather information and help to improve the policy. The compiler consults it for inline / no-inline decision-making while inlining. Sequential decision refers to state, action, and reward. When the compilation finishes, it makes a decision. Thereafter, the log is passed to the trainer to update the model. It continues repeating until a satisfactory model appears.

The policy is embedded into the compiler. Thus, it helps to offer inline / no-inline decisions during compilation. Unlike the training scenario, you don't find the policy creating a log. However, you can see the TensorFlow model embedded with XLA AOT. It helps to transfer the model into executable code. Thus, it can avoid TensorFlow runtime dependency and overhead. In this case, it decreases the additional time and memory cost of the ML model.

You can see the policy on a big internal software package with 30k modules. It is generalizable if you apply it to compile other software. Thus, it can achieve a 3% ~ 7% size reduction. Time is also essential for the generalizability across the software.

As the compiler and software are getting developments, the policy must retain good performance for a reasonable time.

Register-Allocation (for performance)

The framework helps to improve the register allocation pass. Thus, it can improve the code performance in LLVM. Register Allocation helps to assign physical registers to live ranges.

When the code executes, different live ranges are finished at different times. Thus, it can free up registers for use. In the instance, you can see every "add" and "multiply" instruction needs all operands. It gets the result in physical registers. It allocates the live range x to the green register. This task completes before live ranges in the blue or yellow registers. When x is completed, you can see the green register. Then, it will assign to live range.

While allocating live range q, you don't find any registers. Therefore, the register allocation pass must decide which one it can evict from its register to create space for q. We know it as the "live range eviction" problem. It is the decision why you should train the model to replace the original heuristics. It helps to evict z from the yellow register and assign it to q and the first half of z.

You can see the unassigned second half of live range z. Now, you can see the eviction of the live range t, and it is split. The first half of t and the final part of z prefer to use the green register. You can see an equation of q = t * y, where z is unavailable. It means z is not assigned to any register. As a result, you can get its value available in the stack from the yellow register. After that, the yellow register gets reloaded to the green register. You can see a similar thing happening to t. It can add additional load instructions to the code and degrades performance. The register allocation algorithm wants to decrease any type of issues. You can use it as a reward to guide RL policy training.

The policy for register allocation gets training on a big Google internal software package. It can experience 0.3% ~1.5% improvements in QPS. The term stands for queries per second.

The bottom line:

MLGO is a framework to integrate ML techniques in LLVM, an industrial compiler. It is a framework you can expand to make it deeper and broader. If you want to make it deeper, you should add more features. Then, you need to apply better RL algorithms. But if you're going to make it broader, you should apply it to more optimization heuristics.

Google released its first autonomous cars in 2010. During this time, the spinning cylinder has gotten the most fame and attention by standing out uniquely. It is a car's light detection and ranging (LiDAR) system. This system is suitable with light-based radar. In addition, the solid-state LiDAR system helps cars to avoid obstacles by offering cameras and radar in a combination. Thus, it helps cars to drive safely. Let's know about solid-state LiDAR.

Since then, people have started using affordable chip-based cameras and radar systems. It is because light detection and ranging navigation systems are mechanical devices that can cost a lot of money, especially for autonomous highway driving.

However, the new type of high-resolution solid-state LiDAR chip makes all things easier. Ming Wu, a professor of electrical engineering and computer sciences and co-director of the Berkeley Sensor and Actuator Center at the University of California, produced it. In the journal Nature, you can find this new design on Wednesday, March 9.

The technology is based on a focal plane switch array (FPSA). This array is a semiconductor-based matrix of micrometer-scale antennas. It can collect light similarly to sensors found in digital cameras. However, you may not find the resolution of 16,384 pixels impressive, and it is when you compare it with pixels found on mobile cameras.

Design of solid-state LiDAR:

You can see its design in megapixel sizes. According to Wu, it uses the same complementary metal-oxide-semiconductor (CMOS) technology to make processors. As a result, you can find a new generation of strong and reasonable 3D sensors. You can use it for drones, autonomous cars, robots, and even mobiles.

LiDAR barriers:

The technology captures reflections of light that its laser emits. Besides, it measures the required time for light to go back or change in beam frequency. Thus, it maps the environment. In addition, it can clock objects' speed moving around it.

The systems come with strong lasers, and these help visualize objects hundreds of yards away, even if they are available in the dark. Besides, they can create 3D maps with high resolution, and it is lucrative for a car's artificial intelligence. Using 3D maps in high resolution, we can differentiate vehicles, bicycles, pedestrians, and other hazards. Wu also said that their motive is to illuminate a very large area. But trying such a thing doesn't allow light to travel enough distance. Therefore, if you want to maintain light intensity, it is essential to decrease the areas illuminated with laser light, which is when you need to use the FPSA.

This switch array has a matrix of small optical transmitters, antennas, and switches. These help to power on and off them rapidly. Thus, it helps to channel all laser power via a single antenna at a time.

MEMS switches of solid-state LiDAR:

Generally, silicon-based LiDAR systems need thermo-optic switches. These depend on big changes in temperature so that they can develop tiny changes in the refractive index and bend. Thus, it can redirect laser light from one to another waveguide.

Thermo-optic switches come in large sizes. Besides, these are power-hungry. While jamming excessively onto a chip, it can create so much heat. Thus, it allows you to operate itself accurately. It is one of the reasons why FPSAs are limited to 512 pixels or less.

In this case, Wu's solution is lucrative. Therefore, it is better to replace it with microelectromechanical system (MEMS) switches.

According to him, the construction is like a freeway exchange. He added that if you are a light going from east to west, you need to turn to 90 degrees when we lower a ramp, and it allows you to move to the South from the North.

MEMS switches help to route light in communications networks. If you want, apply it to the system. Besides, these come in a smaller size than thermo-optic switches. In addition, they use far less power and switch faster.

While powering on a pixel, a switch emits a laser beam. In addition, it helps to capture the reflected light. Every pixel is the same as 0.6 degrees of the array's 70-degree field of view. In this case, FPSA helps to generate a 3D picture of the world by cycling rapidly through an array. When you mount a few in a circular configuration, it helps to generate a 360-degree view around a vehicle.

Mobile cameras of solid-state LiDAR:

The professor wants to boost the FPSA resolution and range before the commercialization of his system. He said that they face challenges to make optical antennas smaller. But, the switches come in large sizes, and they can be made a lot smaller.

Conclusion:

The professor also wants to boost the solid-state LiDAR's range by only 10 meters. He added that the number could reach 100 meters or even 300 meters. He used cameras in vehicles, robots, vacuum cleaners, surveillance equipment, biometrics, and doors. In addition, there are multiple potential applications also. Xiaosheng Zhang, Kyungmok Kwon, Johannes Henriksson, and Jianheng Luo of UC Berkeley are the names of the co-authors.

Pages

Sunday, 10 July 2022

MLGO: A Machine Learning Framework