Jump to content

[OpenXR] Multiple performance issues with Wave runtime


Recommended Posts

I am developing an OpenXR streaming application for Linux, https://github.com/Meumeu/WiVRn/ and noticed performance and conformance issues on Wave runtime compared to Oculus runtime.

Composition layer blending
If we don't initialize the alpha channel bit in projection layers, they would not show on the headset, although it would be visible in video.
https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#composition-layer-blending states:

Quote

If a submitted swapchain’s texture format does not include an alpha channel or if the `XR_COMPOSITION_LAYER_BLEND_TEXTURE_SOURCE_ALPHA_BIT` is unset, then the layer alpha is initialized to one.

Vulkan validation layers
When running with vulkan validation layers, the runtime generates

Quote

Validation Error: [VUID-VkImageCreateInfo-initialLayout-00993] Object 0: handle = 0x71844ba7a0, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x4c3c1da0 | vkCreateImage(): initialLayout is VK_IMAGE_LAYOUT_FRAGMENT_DENSITY_MAP_OPTIMAL_EXT, must be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED. The Vulkan spec states: initialLayout must be VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkImageCreateInfo-initialLayout-00993)

xrLocateViews viewCapacityInput
Calling xrLocateViews with a viewCapacityInput set to 0 takes about the same time as with a value of 2. When using generic code that first calls with 0 then uses the right number of views, this doubles the required time.

JNI calls
Many OpenXR functions in the wave runtime perform JNI calls, if AttachCurrentThread was not called by the application for the thread, this has a noticeable performance penalty. OpenXR specification does not require applications to do such a call, and no documentation states it.

xrLocateSpace performance
The xrLocateSpace function performs significantly worse on Wave runtime compared to Meta Quest, a single call is about 30µs on Quest compared to 400 on Wave. Android profiler shows that a significant portion of the time (about 1/6th of xrLocateSpace) is spent in string operations. The same input polling loop is idle 86% of the time (polling rate of 1ms) on Quest, while on Wave, application needs to reduce polling rate up to 5ms for a 50% idle time.

Network consistency
The application requires very consistent network, we acquire the low latency wifi lock using Android API. UDP packets from headset to PC appear to have an irregular pacing and arrive in bursts. Sometimes there is not packet incoming, for durations up to 50ms

Compositor performance
Compared to Oculus Quest 1, at the same display resolution, the predicted display time is about 15ms higher on Vive XR elite.

Overall result

spacer.pngspacer.png

Controller-to-photons latency (ms) over time for Quest 1 (left) and Vive XR elite (right)

spacer.pngspacer.png

Distribution of controller-to-photons for Quest 1 (left) and Vive XR elite (right). x axis is the latency in ms, y axis the number of samples.

From those plots, we can see that total latency is about 15ms more on Vive XR elite, which according to my measurements is mostly in Wave runtime, as a higher predictedDisplayTime value. We also see that large values are both larger and more frequent, which is caused by controller data not reaching the PC, and therefore older predictions are used.

Bonus benchmark

A totally scientific benchmark I used is Beat Saber native on Quest, streamed on Quest and streamed on Vive XR elite, on the same level, scores are 200k (Quest native), 186k (Quest streaming) and 130k (Vive XR elite streaming).

 

What I am expecting from this post:

Fixes from Vive team on the OpenXR runtime performance and non-conformance issues.

Tweaks if any to reduce the predicted display time at the time of xrWaitFrame.

Help to investigate the network pattern.

Link to comment
Share on other sites

  • 1 month later...

I'd like to get feedback from HTC about xrLocateSpace.

Here are flamecharts from the thread dedicated to tracking.

Quest 1:

flamechart-quest1.thumb.png.83c1f5126bcb77d04fa0a46cd1ab4ff4.png

HTC XR elite:

flamechart-htc-xr-elite.thumb.png.9a2299ac7f790835df912ff727d32a7b.png

As a first note, the thread has a throttling mechanism and attempts to be idle 80% of the time, and will reduce the polling frequency in an attempt to reach it, but no less than one sample each 5ms. On Quest, polling is under 2ms and the idle time of 80% is respected. XR Elite is at 5ms polling interval and cannot reach the idle time target.

Raw data from capture are wivrn-v0.16-quest1.trace wivrn-v0.16-htc-xr-elite.trace
 

I do not know the internals of Wave runtime, but the performance difference is significant, this also causes the fan to throttle up.

Are there any plans to improve this area?

Link to comment
Share on other sites

  • 2 weeks later...

I'd like to stress the other very important point for me:

when calling xrWaitFrame, the returned predictedDisplayTime in frameState is about 15ms more in the future on Wave runtime compared to Quest. This in turns forces the application to query view and controller pose 15ms more in the future and leads to greatly reduced accuracy.

On Vive XR elite, it is consistently 40-41ms at 90Hz refresh rate, on Quest 1 it is in the 23-28ms range at 72Hz refresh rate.

Data obtained by adding a log in https://github.com/Meumeu/WiVRn/blob/4c9f24499eb9bbea3c8949e026b6177e19e7c1a3/client/application.cpp#L1056 and comparing predicted time with value from https://github.com/Meumeu/WiVRn/blob/4c9f24499eb9bbea3c8949e026b6177e19e7c1a3/client/xr/instance.cpp#L217

Link to comment
Share on other sites

  • 3 weeks later...
  • 1 month later...

Today's update improved xrLocateSpace a lot! This is a great improvement.

image.thumb.png.5c99d56c75845ade8a6f78529a768010.png

xrLocateSpace isn't even visible in the flamechart now.

xrLocateViews could still have some optimizations, as I don't see a reason for it to be more complex than locate spaces or hands.

 

Have there been any investigation on the time taken by compositor? Even on simple applications such as hello_xr, xrWaitFrame reports predicted display time 15ms more in the future compared to Quest 1.

  • Thanks 1
Link to comment
Share on other sites

11 hours ago, Xytovl said:

xrLocateViews could still have some optimizations, as I don't see a reason for it to be more complex than locate spaces or hands.

 

Have there been any investigation on the time taken by compositor? Even on simple applications such as hello_xr, xrWaitFrame reports predicted display time 15ms more in the future compared to Quest 1.

Hi @Xytovl,

It is in our roadmap to optimize, thanks!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...