3.6 New Features 3/3: Performance
After reading the previous two articles, many developers were excited and chatted with me privately saying. "With all the new features added, how is the performance?"
The answer is: Very good, in some modules, with a great improvement in performance!
Because "performance improvement" is also a goal that needs to be accomplished in v3.6 as a major Cocos Creator milestone release plan.
In this article we'll take a look at the performance improvements in Cocos Creator 3.6.
Performance on native platforms
In order to simulate the complex execution situation as much as possible, the engine team chose a mixed test case of static scene + animation models.
It is clear from the above graph that v3.6 is more than 50% higher than v3.5 for the same test cases.
Powered by C++
After communicating with the Cocos Engine team, the reply was: "The method is relatively simple, that is, the original scene management and resource management modules that are in the TS layer are re-implemented in C++".
The method is simple but does not mean that the workload is small. The Cocos Engine team has been working on it for more than a year.
But the effort was worth it in terms of results. Since Cocos Creator version 3.6, the Cocos engine can be called a "dual-core engine", i.e. C++ kernel in native platforms and JS/TS kernel in non-native platforms.
This approach is not common in the industry because it is extremely costly. It requires keeping the two engine cores in a peer-to-peer architecture and having the same rendering effects.
But it is also the only way to better serve the corresponding platform and make the most refined optimization for the corresponding platform in extreme circumstances.
V3.3 vs. V3.6
The picture on the left shows the state of Cocos Creator 3.3, with the C++ part in blue and the TS part in red, as can be seen in v3.3.
TS Modules：SeneGraph、Data Driven、AssetManager、Assets
The picture on the right shows the state of Cocos Creator 3.6, with the C++ part in green and the TS part in blue.
C++&TS Modules：Assets、Data Driven
As you can see from this comparison, the work of RenderScene and SceneGraph modules should have played a major role in this performance improvement.
From the current progress, the work related to Assets and Data Driven modules will be finished soon, and the AssetManager module should be solved in the later version. It is believed that the performance of apps created with Cocos Creator running on native platforms will be improved again by then.
The above screenshot is from a project made by one of my team colleagues using Cocos Creator, in which every candle and every flame on the ground is made up of several particle emitters. In v3.5, this scene was particularly hard to run. But in v3.6, it ran very smoothly.
Developers have also repeatedly mentioned that the performance of the particle system limits the creative ability of the art designer, resulting in poor game effects, and hope to be able to join the particle batching to improve performance.
There are two solutions for batching in 3D rendering.
Merging Vertex Buffers
Since GPU Instancing features require hardware and OS support, on devices from a few years ago, the engine would choose to use GPU Instancing in supported devices and fall back to Merging Vertex Buffers in unsupported devices to ensure maximum compatibility.
But the performance bottleneck caused by DrawCall itself is mainly on the CPU side, and Merging Vertex Buffers comes at the cost of increasing the CPU load, so reducing DrawCall by Merging Vertex Buffers does not always improve performance.
With the hardware enhancements and the increased popularity of the new version of the graphics API, the coverage of GPU Instancing is close to 100%. Therefore, GPU Instancing is chosen as the batching solution for particle system.
Moreover, Mesh particles also support batching. For particle systems, there is no difference between Billboard and Mesh in terms of batching.
I tested it with the test case given by the engine team. The CPU particles were still able to stay above 30 fps at a number of 545, shown as follows.
For those who want to try it out, download the test case yourself at: https://github.com/cocos/cocos-benchmark ，choose lobby.scene to enter.
2D rendering performance
The 2D rendering performance of Cocos Creator 3.x has been a major concern, and developers are hoping that one day v3.x will catch up to the v2.x version.
Surprisingly, it was announced in the community beta version released on July 22: 2D performance has caught up with 2.x in various performance tests on the native platform.
As we can see from the figure below, good performance improvements were achieved on both low-end devices (HUAWEI-Honor v9 Android , iPhone 6s iOS) and mid-end devices (XIAOMI 8 Android, iPhone 11 iOS).
I tracked down one of the engine engineers responsible for this 2D rendering optimization upgrade. Got some details about the 2D performance upgrade.
Core modules rewritten in C++
Many of the rendering logic and flow controls were changed from TS to C++, which improved performance in terms of language features.
When the module was changed to C++, it benefited from the features of the C++ language itself, allowing the engine team to optimize performance in more ways. It brought a lot of improvements.
Memory usage is also much reduced thanks to C++ language features.
After using some Byte Alignment and Unions, the memory usage of 2D objects is only half of what it used to be.
TS Module Optimization
The extensive code rewriting in C++ made the TS framework layer much cleaner and brought an opportunity to optimize the structure. In the process, some redundant operations were removed and an inert update mechanism was added. When a property of a component has not changed, no operation is done.
In order to maintain a uniform flow and reduce the failure rate, 2D objects in earlier 3.x versions were rendered using the same rendering flow as 3D objects. However, due to the complexity of the 3D object rendering flow, both in terms of CPU overhead and memory overhead are considerably higher than the dedicated 2D rendering flow.
We rewrote a dedicated 2D pipeline where 2D objects are rendered completely independently from 3D objects, ensuring that 2D rendering is done with minimal overhead. There are significant improvements in CPU computation overhead, DrawCall, memory usage, IO overload resulting in power consumption and burning, etc.
Performance enhancements are a constant work in progress, and are part of the iterative enhancements that come with each release.
Cocos Creator 3.6 is not a perfect version, for example, there is still a lot of room for improvement in package size, memory, loading speed, etc. The official performance tests can't cover all project situations, and more actual data needs to be verified and given by developers yourselves.
However, I believe that with continuous optimization, it will definitely surpass 2.x in all aspects and become an excellent 2D & 3D integrated dual-core engine in the future.
Whether it's native performance, particle performance or 2D rendering performance, developers often encounter and cannot bypass the hurdles in their projects, and only when the engine perfects this, developers can make their products better.
We are glad to see that the Cocos Engine team has been working hard to improve the editor functionality and development efficiency of Cocos Creator. At the same time, it maintains its advantages of dual-core engine (Native and Web) and scalable architecture (high performance, low power consumption, easy customization) while enhancing rendering features and improving rendering effects.
Well, this concludes the inventory of new features in Cocos Creator 3.6.
I hope these articles can give a detailed explanation to those who don't have time to analyze the new features, dig into the value of the important features and functional improvements in 3.6 and present them to everyone. When you need to make technology selection and upgrade decisions, it will provide an informed reference.
Thanks for reading, and let's look forward to the next engine release!