Wang Yulai, an associate professor and Deputy Secretary of the Tsinghua University, said that October 24th is a special day. On this day, he took part in the first public conference since the founding of the high-tech venture company, with the identity of the co founder of science and technology, and at this conference, his students, that is, CEO euleus of deep learning technology, announced a new round of financing after introducing some new products.
The second right is Wang Yu
This round of financing is undoubtedly an important opportunity for the development of deep discrimination technology, which has been recognized by the industry soon. And Wang Yu has also received an interview on Lei Feng’s network on this important time node.
Cooperative optimization of algorithm and hardware
In this conference, DPU is probably the most technical term from the CEO of Shenzhen Science and technology.
In fact, DPU (Deep Learning Processor Unit, deep learning processor) is a core area chosen by deep learning technology when it is put into artificial intelligence and deep learning. This is due to the fact that several co founders of deep learning technology realized that GPU, which is now widely used by deep learning algorithms, could not meet the requirements of high performance and low power at the application level at the same time.
“The depth learning processor must have three steps: model compression, model fixed-point, compilation, and must have a special structure for neural network.” In this case, Wang Yu decided to lead the team to develop a real deep learning processor, DPU, from the perspective of algorithm, software and hardware collaboration.
However, for deep science and technology DPU, a core technical term is Deep Compression technology.
In 2016, on the top of the world’s top learning conference ICLR, an article won the best paper award with the title of Deep Compression:Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman. The first author of this paper is Han Song, a doctoral student at Stanford University, and Han Song is one of the co founders of Shenzhen Science and technology.
In an interview with Lei Feng net, Wang Yu introduced the technical principles of deep compression.
We pay attention to the collaborative optimization of algorithm and hardware. Where is the optimization of the algorithm? For example, the neural network is like a digital matrix, because the most critical weight in the neural network is the matrix; the concept of compression is to turn many parts of the matrix into zero, so that many places do not need to be calculated again, thus reducing the amount of calculation.
Through deep compression technology, it can not only compress the neural network dozens of times but not affect the accuracy of the algorithm, but also can use “on chip storage” to store the depth learning algorithm model, reduce memory reading and greatly reduce power consumption. At the same time, based on optimized synergy, deep compression technology also puts forward new requirements for hardware itself. Wang Yu said:
The compression of the upper layer algorithm will also have a superimposed acceleration process for the underlying hardware: when the compression degree is very high, it is not a dense and dense calculation method, the number of the bottom and the calculation are not dense, but sparse, this becomes a random process of visiting. As a matter of fact, the neural network after being thinning is not the same as the underlying hardware, so that’s why we have to do its own hardware.
A provider of deep learning solutions
The hardware architecture described here by Wang Yu refers to the two underlying architecture, the Aristotle architecture and the Descartes architecture, which has already been introduced by Shen Jian Technology for deep learning processors.
Wang Yu said to Lei Feng, the Aristotle architecture is aimed at the convolution neural network (CNN), because the computer vision processing often uses CNN, so the Aristotle architecture is generally used to deal with the intelligent problem of image related; the latter is aimed at the full link, because the speech related processing is related to the full link neural network. This kind of neural network is mainly accelerated by Descartes plus such a structure.
At the conference site, Shenzhen Science and technology also released several DPU hardware products based on the above two hardware architectures.
First of all, in face recognition, Shenzhen Science and Technology launched  , DP-1200-F01 face recognition module and DP-2100-F16 face analysis solution respectively. The former is characterized by the high frame rate of 18 frames and 3 watts of power consumption, which can be used in front end products such as face recognition cameras. The latter is mainly used in the back end, and the single board can support the real-time recognition of 16 road 1080p video, and the overall power consumption is less than 30 watts.
In addition to the two, Shen Jian Technology also introduced a video structured solution DP-2100-O16, which can achieve the real-time video structure of 16 road 1080p HD video. It can detect, track and analyze the attributes of people, cars and non motor vehicles.
In addition to image class applications, Shenzhen Science and technology also launched a DP-S64 speech recognition acceleration program. It supports full sparse neural network processing, and single board card can support up to 64 users’ simultaneous speech recognition acceleration. In this case, based on sparse neural network and model compression, the delay of speech recognition can be shorter.
It is worth mentioning that these DPU products, which are integrated with their own algorithms, are developed on the FPGA chip based on the world’s largest FPGA manufacturer. And deep discrimination technology also has its own creation in the FPGA technology; at the FPGA 2017 conference, ” ESE: Efficient Speech Recognition Engine with Sparse LSTM on;” is the only best paper.
However, in the underlying hardware, Shenzhen Science and technology is not completely confined to FPGA, and it also has certain plans in ASIC. In this regard, Wang Yu said:
The advantage of FPGA is that it can be changed and can be added at any time; if you need to change the architecture, or the core architecture iterations, you can iterate quickly inside the FPGA, so it can cut into the market in a very short time and provide the performance equivalent to the GPU or even more than the GPU. But AISC can provide the potential of ten times better than FPGA, or more than one order of magnitude, so AISC  in a number of dedicated fields, such as cell phone chips, is the best in power and performance, but it has the longest time to develop from the design specifications to the design.
However, Shenzhen Science and technology does not regard itself as a Hard Suits Inc. It is more willing to consider itself as a provider of deep learning solutions. So on the basis of hardware, Shen Jian Technology also developed a deep neural network development suite   to the above DPU; DNNDK (Deep Neural Network Development Kit). In this regard, Shen Jian Technology CEO Yao said that the target of Shen Jian Technology is the world’s one of the world’s hottest AI companies, NVIDIA, which not only provides hardware, but also provides a complete set of ecological systems.
DNNDK is also the first SDK developed specifically for deep learning in China.
The cloud and the end must be cooperated
For any company which is supported by technology, it is a business problem to face the market to realize the commercialization of technology.
Under the current market conditions, Shen Jian Technology first chose the security area with high requirements for face recognition. In fact, the two face recognition modules mentioned in the previous article,  , DP-1200-F01 and DP-2100-F16, and three DPU, which can implement the video structured DP-2100 O16, have been introduced to the market. Applicable products.
However, it is obvious that Shenzhen Science and technology will not stay on security. At the beginning of 2017, Shen Jian Technology received tens of millions of US dollar A round financing from the leading people from Xilin and Co, which was the largest FPGA manufacturer in the world, providing more than capital and technical support for deep learning technology, but also a lot of potential customer resources and overseas market opportunities. Similarly, the deep accumulation of MediaTek in smart phones, families, automotive electronics and other fields is of no use to Shenzhen Science and technology.
In October 24th,  , A+ , and $40 million, Samsung and ants were involved. In this regard, Shen Jian Technology said that the ant gold clothing will help to deepen the exploration of more applications, including finance, and to focus on storage and other cooperation with Samsung. However, in response to Lei Feng’s question about Samsung investment, Wang Yu disclosed the news:
Samsung’s investment first started as a storage department, and then their multimedia and smartphone departments were all interested in us, but there was no way to disclose what we were going to.
Lei Feng’s network is interested in the news because, during the launch, Shen Jian showed a group of data that compared their FPGA based Aristotle accelerators to Apple A11 and HUAWEI kylin 970. The data showed that when three products were compared to three neural networks, GoogleNet-V3, ResNet-50 and VGG 16, the utility rate of the Aristotle accelerator was more than 50% and higher than that of the opponent.
In addition, at the end of the conference, Shen Jian Technology announced a SoC called “the Tao”; it uses the TSMC 28nm process technology and the DP4096 Aristotle core, with a power of 1.1 watts, and can reach the peak performance of 4.1 T. When Lei Feng asked if the chip could be used in mobile terminals of smart phones, Wang said:
Hopeful。 The power of the “Tao Tao” is about 1.1 watts, with a performance of several T. This chip can be tailored to a lower power consumption scenario, such as IOT. At present, the IOT is below 100 MW, and the smart phone is 100 to 500 milliwatts. In security, we hope it is 1 watts. The amount of computing power that the chip can do is consistent, and this will not change too much; in a specific scene, just the PE in this “box” can be used in a lower power consumption scene.
About deep learning, neural network and other technologies in the future development of smart phone mobile terminals, Wang Yu also expressed his view:
I think the application of deep learning in smart phones will be particularly special, which is sure; apple is a leading trend in the smartphone field, and it is sure to think a lot before it is done. But I think the computing power provided in the smartphone is limited; the real understanding or analysis of the scene, or more computing power, is often needed, so the industry will often send something to the cloud and make more detailed analysis. After that, the cloud and the end must be compatible. I don’t think this is completely restricted to mobile phones.