Interview with big data expert Han Jian: the integration of distributed big data technology and banking industry

2018-07-22

Han Jian, a senior IT expert in big data processing and banking applications. Many years are engaged in the field of bank IT, such as data warehouse construction, bank data analysis and decision support, bank data operation, and the use of large data to prevent and control bank risk. In 2008, he graduated from Beijing University of Posts and Telecommunications and received a master’s degree in management science and engineering. It is the earliest data expert to use the algorithm to control the risk of the bank and to prevent and control the bank’s credit risk. It has served as senior IT consultant of GLG, senior data storage expert of postal savings bank, senior data analysis expert of Minsheng Bank, leading post savings bank savings project, postal savings bank company business project, Minsheng Bank centralized operation planning and inspection data project, CBRC EAST A large number of national commercial banks, such as on-site inspection projects, Minsheng Bank and the people’s court, implement the special projects of check and control. At the same time, through the Hitachi Data Systems Certified Storage Manager Hitachi advanced storage authentication, HP-CSE HP advanced certification and a number of international IT certification, dedicated to the use of information technology and distributed large data technology to transform the modern commercial bank business process, reduce the cost of bank operation, improve the level of risk prevention and control. At present, it is mainly responsible for the planning, development and operation of the commercial bank information system, the mass infrastructure operation system, the cloud platform and the large data analysis platform. It has rich experience in banking business process optimization, IT product planning, bank application system construction and large data storage and processing.

Interview with big data expert Han Jian: the integration of distributed big data technology and banking industry

Q: would you please tell me about your career? What impact and experience did you have on the experience of the famous large commercial banks such as postal savings bank and Minsheng Bank?

Han Jian: to speak of sentiment, the two word generalizes “gratitude”. Both the postal savings bank and Minsheng Bank have given me a very broad platform. I have the privilege of witnessing the great changes brought to traditional banking by the IT technology represented by large data and cloud platforms. At the same time, I have the ability to cross boundary and integrate IT technology with traditional banking business. From my personal career, I have gone through two main stages.

The first stage is the career experience of the postal savings bank from 2008 to 2014. As the leader of the project, I led the team to complete the construction of the core system by using small machine clusters instead of large institutions and build a distributed large data storage platform. This technology is the first successful attempt in domestic banking and is open in the open style. There is no successful case in the world to build such a large scale business core system, and there is no successful case in the world. The second stage is the professional experience of the people’s livelihood bank from 2014 to the present. Fruitful results have been achieved in compliance, anti fraud and operation process optimization.

Q: just now you mentioned the use of Minicomputer to replace the distributed cluster technology of the mainframe. This technology has been highly recognized by the Ministry of industry and information, the people’s Bank of China, the China Banking Regulatory Commission and other state ministries and regulators. Can you simply talk about the content of the technology and its significance?

Han Jian: all right. China’s commercial banks have been mainly based on large and medium-sized host and centralized architecture to build their business systems. This architecture has the advantages of mature technology, reliable system and relatively simple application. At the same time, the core technology is monopolized by foreign suppliers, and the cost of system input is high. With the improvement of the processing capacity of the open platform, the growing maturity of high-speed network technology, and the rapid development and application of new technologies, such as cloud computing and distributed storage, the construction of business system based on open platform and distributed architecture has become commercial silver because of its advantages such as low cost, easy expansion, independent control and so on. The trend of the transformation of the architecture.

The project was officially launched in June 16, 2011. The whole project lasted three years. In May 26, 2013, the project was successfully launched in Shaanxi. After that, after the trial of three provinces, the five batch and the 27 rehearsal, a total of 30 provinces (districts and cities) were completed in the whole country. Since the start of the promotion work, business processing is normal, the transaction success rate has always been above 98%, and the success rate of the system has always been above 99%. This project is the most complicated and difficult construction project in the information technology field. The technical route of replacing the core system with the small type machine cluster is successful. It is the first successful attempt in the domestic industry and the construction of such a large scale business core system on the open platform, and there is no successful case in the world. Therefore, the Ministry of industry and Information Ministry and other ministries also said that the technology has made a positive exploration for the national core technology “autonomous and controllable” security strategy, and has taken a key step in maintaining the security and reliability of Finance and information. And I myself am very lucky to be the leader of the project and win the “China Post Group Science and technology award 2011”.

Q: when it comes to distributed storage technology, what do you think is the mainstream technology of big data storage? Which is more specific to the banking industry?

Han Jian: at present, there are three most typical data storage technology routes: new database cluster with MPP architecture, technology extension and encapsulation based on Hadoop, related large data technology and large data machine around Hadoop, a combination of software and hardware designed for the analysis of large data. . The first two technologies are distributed storage, the third is centralized. From my point of view, distributed storage is the development direction of big data in the future. Compared to these two distributed storage technologies, the MPP distributed database has a certain advantage over the Hadoop distributed system in the structured data processing of complex logic, and can be based on SQL development. It is easier for the developer of a bank system with rich SQL experience to develop and operate the operation and maintenance, but this does not mean that the MPP is distributed. Database is the best solution for big data processing. Because in bank system data, the value density of structured data is usually higher than that of unstructured or semi-structured data, while unstructured data in bank data occupies a large amount of storage resources. This is because the structured data in the banking system is mainly based on accounting data, while unstructured data mainly focus on voucher images and other data. Of course, structured data also includes data with low value density, such as some log information, and data storage and processing technology is changing from “one architecture to all applications” to “multiple architecture support multi class applications”.

Q: in the new generation of system architecture, big data is the core element. When building a big data platform, it is necessary to embark on the construction of big data governance. What do you think of the big data governance in the banking industry?

Han Jian: let me first talk about big data governance. The data governance itself is divided into two differences in the narrow sense and the broad sense. In the narrow sense, the governance is mainly the organization, the system and the process, while the generalized governance includes the data quality and the data standard. Data governance emphasizes two points, one is high-level support, the two is the extensive participation of various departments. The CBRC has good data quality standards, and will conduct off-site inspection and on-site inspection, which is the biggest supervision of bank data management. In 2017, China Banking Regulatory Commission (CBRC) started the work of standardizing the data of commercial banks. As a member of the CBRC regulatory data standardization group, I am doing this research. It will standardize the data of various business areas in the front and middle stage of the commercial bank. From this perspective, both regulators and banks themselves have a clear understanding of the importance of big data governance.

Q: for years, you have ploughed up the bank’s IT data. As a pioneer and outstanding contribution expert in this field, do you think the direction of big data work and the proposal of team building in the bank?

Han Jian: the development of big data technology is changing with each passing day. The accumulation of talents and technology of data can not be accomplished overnight. In terms of talent reserve, we should take the principle of “importing a batch, cultivating a batch and storing a batch”, introducing a small number of high-level technical personnel, training a large number of stock technical personnel through specific projects, and forming a wide range through the big data technology competition oriented to universities and society, and the help of the open source community. Effective reserve of talent.

爱上海419论坛 爱上海