How to evaluate the ARKit released by Apple on 2017 WWDC?

According to Lei Feng: Last night's WWDC must have made many AR/VR practitioners excited again. Apple finally shot it. Not only announced the cooperation with Valve, but also announced the first VR Ready new iMac Pro, even more powerful is the launch of ARKit for iOS 11, with iPhone support for AR, I believe that soon AR will become everyone's daily life. What does ARKit mean to developers? What does it mean for the entire industry? The author of the article Alibaba Tmall interactive technology expert Jiang Jiayi made a detailed analysis of ARKit in knowing the truth, Lei Fengnet was authorized to reprint.

(I saw this scene and I knew I was steady this evening!)

First of all, I will explain that I am a strong stakeholder. I used to be an AR company’s algorithm engineer. I am now an internet company in China. I’ve developed AR algorithms... The following reviews are mainly for AR peers. I don’t explain more technical vocabulary. I don’t understand. Students also asked Baidu.

Frankly speaking, it's not surprising to see that Apple released ARkit. After all, Apple is one of the most complete companies in the AR other than Microsoft. Many excellent AR technology companies have been bought by Apple. As early as last year when they joined Project Tango's Hackathon. I have predicted that there will be movements in Apple. But there are two things that I did not think of. One is that Apple's release time is so fast, and the other is that it does not depend on any hardware upgrade.

From the screenshot you can see that the conference was using an iPad.

And it is definitely a monocular camera. Taking into account Apple's engineering capabilities, ARkit should be able to support most of Apple's devices (in my personal experience at least to 5S and the same level iPad), this also meets the world that Apple said at the conference. Positioning on the largest AR platform.

Then look back at ARkit, and let's look at what it does:

First, fast and stable motion positioning. This is the most basic AR function. From the demonstration, it can be found that the entire positioning is very stable and accurate. The accuracy is very good. The characteristics of the desktop are not rich, indicating that the robustness is very good. The rendered Demo model is complex, but it feels fluid, indicating that both real-time performance and algorithm power consumption have been deeply optimized. From the presentation of the conference, ARkit's standard should be top in the industry.

Second, the estimation of planes and boundaries, plane estimation is not uncommon on monocular SLAM, but it is difficult to judge from the presentation whether it is based on 3D point cloud fitting or IMU data. The initialization method looks more like that based on IMU. This is not common before the boundary estimation. The only thing that can be seen in the demo is that the virtual villain dropped out of the table (it was too shocking to have a screenshot), indicating that ARkit may not be a simple VIO algorithm like REST (from the release of The document looks like VIO...) The part of the point cloud construction has certain output.

Third, the illumination estimation, this is not easy to say, the need for monocular to do light estimation is likely related to the architecture of the algorithm itself (direct method?), in what form the output does not see the interface is difficult to judge, most of the AR applications are not This type of data is required unless it is based on the actual rendering of the actual rendering.

Fourth, the scale estimate, this is very powerful, the students who do the unary SLAM all have understanding, the monocular is unable to solve the scale problem, although the video is not shown, but if it really solves the scale problem, it shows that the apple is in the IMU We have done a very advanced job in visual fusion and have been very well-engineered. We are looking forward to the actual experience of this feature.

Fifth, the support for various development platforms or engines indicates that Apple has absolutely been “premeditated” for AR, and has great ambitions. It does not give other opponents a margin, and it is necessary to build a complete and extensive AR content as soon as it comes. Development ecology.

In general, ARkit basically implements most of the functions that Monocular+IMU's SLAM algorithm can provide, and it is of high quality. I believe that Apple is strictly checking the coverage rate, real-time performance and energy consumption of the IMU, so there is no doubt that With a wide range of hardware coverage, iOS will become the most dynamic AR content publishing platform, and the PMGO experience will certainly have a qualitative leap. A big wave of true-AR game applications will come.

Again, analyze what the current ARkit lacks:

The first deficiency is the ability of 3D reconstruction. From the point of view of text interpretation or Demo presentation, the current ARkit only supports planar "reconstruction". Here, the reconstruction is first performed, because it supports vertical planes, arbitrary angle planes, and even Currently, multiple planes are still uncertain (only the horizontal planes are supported from the current documents, indicating that the plane's fit may depend on the IMU, and only some point clouds determine the depth information). Even if ARkit contains a complete plane reconstruction capability, it is still not enough for AR applications.

The most basic requirement for augmented reality is the understanding of the real world, such as the reconstruction of geometry. The problem with this lack is the inability to make reasonable physical collisions with objects in the real world. For instance, in the demo, a virtual villain falls off the table. We cannot see the effect of this villain falling to the ground. The other is virtual reality shielding. If there is a cup on the desktop, if we can't rebuild the Mesh of the cup, we can't see the effect of the cup blocking the virtual model. We can see that the virtual model “superimposes” on the cup. This will compare Affect the entire AR experience. Of course, I believe that with Apple's strength, it is not impossible to think about rebuilding.

Here I have two speculations, one may be the process of 3D reconstruction, interaction and interface Apple is still optimizing, the current basic functional interface of the AR is enough to release, Apple wants to slowly. Another possibility is that Apple wants to solve the problem of 3D reconstruction by relying on hardware. It is very difficult to do real-time 3D reconstruction of monoculars. However, based on binocular or depth cameras, this problem can be solved relatively easily. By the way, Apple can also show The "special" iPhone8 that will be released in September will have the best of both worlds.

The second problem is the ability to identify. Currently, the most popular application scenario of AR is not gaming but offline marketing. Offline AR marketing relies on the identification algorithm and positioning algorithm. Apple can make such a mature SLAM algorithm on the mobile-side monocular device. Simple identification and positioning should not be a problem. The difficulty may lie in how the positioning algorithm interfaces with the identification algorithm. If it depends on the identification algorithm on the end, then according to the current Some Apple update review mechanisms, AR content updates will be very troublesome, Apple may need to provide a dedicated editor; but if based on cloud recognition, then Apple's user community is too large, when the QPS cloud recognition will be amazing, whether Apple Not ready to accept such a test?

In addition, the monocular SLAM serving the AR can be said to be a topic that is not well solved in the industry. Therefore, Apple has made great progress in this regard. If we are limited to the identification and tracking of two-dimensional images, there are many mature SDKs that can be used. Even Hololens can be compatible with Vuforia, and Apple may not be willing to assume the need to interface with recognition algorithms.

Finally, talk about ARkit's impact on the entire AR industry, which is based on the roles of industry players.

The first is the most downstream hardware player, led by the AR glasses manufacturers, such as Microsoft, Meta, ODG and so on.

The impact of such players should not be large, because the AR glasses itself is not serving the current C-side customers, most of the custom-tailored use of the B-side customers, and there will be no large shipments in the short term. On the contrary, this is actually a good thing for AR glasses manufacturers, because the mobile experience AR is a variety of defects, such as unable to free their hands, but playing AR on the phone can quickly educate users, when the user is used to AR, and want to be more In the high-quality experience, AR glasses can consider the transformation service on the C-side, and Apple is actually accelerating the development of the entire AR industry. However, I believe Apple will not give up glasses since it plans to deploy AR. In the future, Apple hopes to redefine "AR glasses."

The hardware player that is a bit embarrassing here is Google, and its subsidiary Project Tango has been released for a year. Currently, only Lenovo's Phab 2 pro and upcoming Asus ZenFone carry Tango's technology. However, due to the lack of the entire Android AR content, As a result, the shipment of Tango handsets is very low. At this moment, ARkit is released almost instantly covering the entire iOS. So, Google is going to launch a single-purpose AR SDK to cover all Android to cope with or push Tango to upgrade Android hardware. It is a very worthwhile issue to consider. , Here is a bold guess whether Google will open source Tango currently the entire Msckf algorithm? After all, a laser-based SLAM algorithm has been open-sourced and is looking forward to Google’s actions.

There is also an AR hardware module player, such as Intel's RealSense, or Occpital Bridge, which depends on other hardware terminals before it can be used. Although the overall AR's capabilities are higher than the iPhone's, but how to reflect its unique Value has become a problem that has to be faced. Originally, it was the only choice. Now it has suddenly become a "value-added service," and the road to the future has picked up again.

Also worth mentioning is the so-called AR glasses case or MR glasses case, something like a VR box with simple optical devices or just allowing the phone to expose the rear camera, which needs to be inserted into the phone to play... Equipment costs are not high, but users can use AR-enabled mobile phones to quickly experience experience similar to AR glasses. Such boxes may be driven by sales, but long-term development can refer to the current status of VR boxes.

Again, we talk about the SDK player, the AR algorithm player. The industry knows that due to the rapid development of AR requirements, almost all SDK companies have been doing research and development of SLAM algorithms based on monoculars in the recent year. Whether at home or abroad, we compete in a large number of repetitive wheels, but in addition to Vuforia, No one has come up with a monocular SLAM algorithm SDK, which is more or less, and each has its own problems.

At this point, Apple took out ARkit, according to the conference demo, the effect is more than all other players, not to mention this is the iOS native algorithm, which means that all APP does not need to integrate any SDK to have the best AR capabilities on the market. It can be said that most of the SDKs are deadly blows, and they feel sad for a minute... Of course, SDK players are not completely empty. As I said above, ARkit is not perfect and full-featured. SLAM is no way to do it, but it is still possible to extend the functionality on this basis, so at this time the SDK company should carefully think about how they are positioned to find their own value and ARkit coexist.

Then, we talked about the players of the upstream AR software. The social software such as Snapchat and Facebook are the first ones. Both of them have just released AR-related products and editors, and both rely on their own AR algorithm. , it can be said that the two companies have been very advanced in algorithm. Unfortunately, ARkit released and returned to liberation overnight, not only did not open up the gap with each other, but behind a large number of small players, like FaceU, B612 software, in the face of the content of the operation is already very good, this When suddenly got AR AR killer, it is possible to catch up. Therefore, the killing of AR applications may be very exciting next time, very much looking forward to!

The upstream players are also the game industry. Before they contacted a lot of game companies because of work, it can be said that game companies are interested in AR and cannot afford to invest. On the one hand, the technology is not mature enough, and the cost and risk of accessing the SDK are relatively high. The effect is not necessarily good. Second, the game industry currently has a good momentum of development. PMGO's decline in the latter period is also more severe. The game company's momentum is insufficient. Now that ARkit is released, there is a native AR capability. At this time, the game company may not be able to wait and see. After all, a certain XX division has made great efforts in the AR, and now there is no doubt that new capabilities will not be missed. Even if it suits the trend, other companies will try to add some elements of AR in the game and be more optimistic about the development of AR in the game, especially card games.

Probably the analysis is here. Generally speaking, due to the huge user base of iOS, ARkit's release has a great advancing effect on the entire AR industry. Each AR-related player may need to re-examine their position. How do you find your position in the context of knowing that iOS has AR capabilities? Does pure algorithm or hardware still insist? How to subdivide? If you do software or AR services, how to maximize the use of ARkit?

As an AR practitioner, I sincerely appreciate and appreciate Apple's eagerness and dedication to AR. I hope that AR can become more and more mature and develop better and better.