A Touching Story About JavaScript Gesture Responders in React Native

Recently, I had to dive deep on the native hooks for setting JavaScript touch responders in React Native, and I wanted to share what I’ve learned.

ScrollView Fail

We had a few issues filed on react-native-windows related to PanResponders not working when they are applied to a view that has a ScrollView ancestor. A common example of this is a ListView where the rows respond to a swipe gesture, e.g., SwipeableListView.

ezgif-3-54b814f2f8.gif

An Underappreciated UIManager API

It turns out, when we were building react-native-windows, we had a few specific apps we were testing, and gestures inside a ScrollView was not a scenario in any of them. In fact, there was no SwipeableListView component available in core React Native when we first started active development. So, when we were scratching our heads trying to figure out what the UIManager setJSResponder API did, and not implementing it had no effect on our target apps, we punted on the issue and figured it was something we could implement later if we needed it. After all, we had lots of other features to implement, and we were trying to keep pace with the mach speed evolution of React Native.

It turns out that this API is quite important for the touch handling implementations in iOS and Android.  On Android, there is an API called onInterceptTouchEvent, which allows ViewGroups to preview and capture touch events before they are dispatched to their children.  When setJSResponder is called, the React Native bridge on Android sets the given responder tag inside the JSResponderHandler, which is reachable from all ViewGroups. During the chain of onInterceptTouchEvent calls down the view hierarchy, each ViewGroup evaluates if it is the designated responder in the JSResponderHandler, and if so, captures the touch event. When the event is captured, it prevents system responders like scrolling and drawer layout operations from responding to the gesture.

On iOS, the use of the setJSResponder API has a much more limited scope. When setJSResponder is called, the responder tag is set to a public property on the UIManager. Inside the native ScrollView implementation, the panGestureRecognizer on the UIScrollView is set to a function that checks if the current JavaScript responder is a descendant of the UIScrollView, and if so, cancels the pan on the UIScrollView so the events are only processed by the specified JS responder.

UWP is a Whole Different Story

So because we were just ignoring the setJSResponder call, we were doing nothing to ensure that the native ScrollView in UWP did not just intercept all the gestures before the inner views had a chance to capture the pointer.

UWP does not have an analogy to the onInterceptTouchEvent for parents to capture the input before a child view has a chance to, so we were not able to implement a similar architecture to Android.

UWP does, however, have an API called CancelDirectManipulations on UIElement that looked like a promising way to effectively replicate the pan-canceling behavior of setJSResponder on iOS. But before we go into why that turned out to be inadequate, lets go back to the sequence of events that leads to setJSResponder being called on the UIManager in the first place.

When a gesture is started in React Native, the native touch implementation sends a “topTouchStart” event on the RCTEventEmitter JavaScript API, receiveTouches. The JavaScript event implementation uses a bubble/capture algorithm to find the designated event responder for the touch.  The way the bubble/capture algorithm works is a two phase traversal of the view hierarchy, starting from the root and ending at the touch target (as determined by the native touch handing behavior). For each node along the path from root to the touch target, the “capture” handler is executed (in the case of “topTouchStart”, the capture handler is “onStartShouldSetResponderCapture”). If any of these handlers return true, the owner of that event handler becomes the responder. If none of the capture handlers return true, the bubble handlers are evaluated from touch target back to the root. Once one of these bubble or capture handlers returns true, the element is notified that it is now the responder via a call to “onResponderGrant” and, assuming the responder has changed, the native setJSResponder API is called on the UIManager. Other events can trigger this responder change, including move events and scroll events, and any change in the responder instance will result in a native notification via the UIManager setJSResponder API. For example, the JavaScript PanResponder module uses touch move events to check if an element should take over as a responder.

So, now that we know how the setJSResponder API is called, we can discuss why the scroll cancellation approach in iOS won’t work for UWP. We can use the example of SwipeableRow. On iOS, when the user starts a swipe gesture, the native touch event handler emits a touch start event (i.e., “topTouchStart”) and a series of touch move events (i.e., “topTouchMove”). At some point, the swipe gesture exceeds the horizontal threshold for a valid swipe, and the SwipeableRow takes over as the touch event responder. Once the SwipeableRow takes over as the responder, the UIManager setJSResponder API is called, and the UIScrollView cancels any active pans, ensuring all subsequent touches are sent to the RCTEventEmitter and the SwipeableRow can continue to respond.

On UWP, however, when the ScrollViewer recognizes pan gestures, it immediately cancels any active pointer event sequences related to the pan gesture. When this cancellation occurs, a “topTouchCancel” event is dispatched. This “topTouchCancel” event is needed for very valid scenarios, such as touch points moving off screen or any other native component taking control. However, in this case, the touch cancellation occurs before the PanResponder is able to hit the horizontal threshold and take control as the responder, even if the gesture is not on the primary scrolling axis of the ScrollViewer. As soon as the “topTouchCancel” event is dispatched to JavaScript, the responder is cleared and the touch history is released, so further processing of the gesture would not occur.

Ultimately, it is the asynchronous nature of the communication between the UI thread and the JavaScript thread in the React Native architecture that prevents the pan cancellation approach from working on UWP. The ScrollViewer recognizes the pan and captures the pointer within a few frames, and the JavaScript thread has no chance of processing the initial touch event and batching the setJSResponder call before this capture occurs.

As a workaround to the async behavior, I considered deferring the touch cancel event to check if setJSResponder would be called in the next batch of native method calls. If the setJSResponder API was called, the CancelDirectManipulations() API would be called to release the ScrollViewer’s capture of the pointer. However, in most cases the ScrollViewer should be allowed to take over as the responder, so we don’t want to introduce a delay in that behavior. For example, Touchable descendants of the ScrollViewer should defer to the ScrollViewer if a pan gesture is started, so we don’t want to cancel any pan gestures in that case.

The Final Solution?

The recommended way to prevent a ScrollViewer from capturing the pan gesture on UWP is to set the ManipulationMode property on the child view that should respond to the gesture instead. There’s actually a fairly similar example to this that conditionally disables scrolling if the input device is a pen, achieved using the ManipulationMode property. However, the main difference between the pen example and what we are trying to do here is, again, the asynchronous disconnect between the UI thread and JavaScript thread.

So, at this point, I’m fairly certain that the setJSResponder API is useless for UWP, at least for the use case of unblocking gestures behind ScrollViewers. The approach I took instead was to declaratively set the ManipulationMode property of the view by requiring the user to set a prop on the React element. A pull request was created and merged, and the manipulationModes prop is available in react-native-windows@0.47.* and higher. We have an example of how to use the prop in the react-native-windows fork of SwipeableRow.

<Animated.View
  onLayout={this._onSwipeableViewLayout}
  manipulationModes={['translateX']}
  style={{transform: [{translateX: this.state.currentLeft}]}}>
  {this.props.children}
</Animated.View>

This is something we will continue to discuss as time goes on, as this declarative requirement is something specific to UWP. We’d love to find an alternative that does not require any additional React props specific to Windows or knowledge of how manipulation modes work.

Advertisements
A Touching Story About JavaScript Gesture Responders in React Native

React Native Windows Moving to Microsoft GitHub Organization

logo.png

Microsoft has been contributing to react-native-windows for a little over a year. I’m happy to announce that the project has moved to the Microsoft GitHub organization. This change of ownership on GitHub can only benefit the open source project and its users, bringing more visibility and authority to the effort. All other aspects of the project will be business-as-usual.

When we kicked-off the effort to bring React Native to UWP, our aim was to push the contribution back to facebook/react-native. We created the ReactWindows GitHub organization as a temporary home for the project until the upstream contribution could be made. When we officially announced the react-native-windows effort last April, a few other platforms for React Native were in development, including an implementation for Tizen and MacOS. Given the growing interest in bringing new platforms to React Native, we worked together with React Native core contributors to land on a strategy that would scale to many new platforms.

Having all new React Native platform efforts land in one repository would have evolved into a maintainability nightmare. Open source contributors would find it more difficult to craft bug fixes and new features, as they would have to consider their impact across all platforms. Likewise, additional CI mechanisms for each new platform would be even more likely to turn out build and test failures due to environmental issues and non-determinism in tests (hey, we can’t always be perfect). This would lead to decreased confidence in pull requests and increased delays in merging.

Furthermore, React Native currently has a monthly release cadence. About a year ago, releases were at least semi-monthly. Combined with intermittent publishing of patches, React Native users were saturated with releases to keep up with. Additional platforms would have only increased the frequency of releases, with even more patches needed. New platforms also would not have the opportunity to control their own release cadence.

We decided that the healthiest strategy for React Native and emerging platform implementations was a plugin approach. This would make adding platforms to existing React Native projects “opt-in”, similar to the way Cordova manages platforms (e.g., cordova platform add android --save). Implementations of native bridges and any overrides of core JavaScript modules for each platform would exist as separate open source projects and NPM packages.

Only a few minor fixes to the packager were required to support new platforms coming in with this plugin approach. The larger issue was making plugin platforms feel like an integrated experience. Luckily, the community had already produced RNPM, a tool for automatically integrating native modules into React Native projects. RNPM has a nice architectural component that supports command line plugins.

We built an RNPM plugin to generate and run react-native-windows projects, similar to the existing react-native init and react-native run-[ios|android] commands. This strategy paid off when RNPM was integrated into React Native, and now instead of commands like rnpm windows and rnpm run-windows, we can use react-native windows and react-native run-windows from the existing React Native CLI.

Many other positive things have happened with the project over the past year. We continue to see increasing activity in the open source community, with more users filing issues and submitting pull requests each week.  We’ve been fortunate to have great partners using and contributing to the framework like BlueJeans and Axsy. BlueJeans’ efforts have expanded React Native on Windows to WPF, enabling apps on Windows 7, Windows 8, and potentially Windows Phone 8.  We’ve also recently integrated Yoga, a cross-platform layout library, finally bringing full parity to the layout style props supported across iOS, Android, and Windows.

It’s a great time to transition the project from its temporary home in ReactWindows to the Microsoft GitHub organization. We look forward to continued contributions from the React Native community and hope this news will bring even more energy to the React Native for UWP effort. Get started today at github.com/Microsoft/react-native-windows!

 

 

 

React Native Windows Moving to Microsoft GitHub Organization

Unbundling React Native Unbundle

I started looking into what was needed to support unbundle on react-native-windows. A react-native “unbundle” is a lazy loading mechanism for your JavaScript modules. Each module is loaded on-demand when it’s first needed, the goal being to reduce the time it takes to start a React Native app, and reduce the memory consumed by potentially unused code. For details on how we added unbundle to react-native-windows, checkout Hacking Unbundle into React Native for UWP (coming soon).

React Native Bundle Varieties

There are four kinds of bundles you might encounter in React Native.

Plain JavaScript Bundle

This is your run-of-the-mill bundle, the one that gets generated with the react-native bundle command. It comes in either a dev or release flavor, with the primary difference being that the release flavor is optimized and minified. After building the dependency graph of all modules referenced in your app, a bundle is effectively a concatenation of all these modules.

File-based Unbundle

This is the version of unbundle that is used by Android by default. Given all the modules in your app, a file-based unbundle is made up of an entry point file and a folder called js-modules that includes a file-per-module for each of the modules in your dependency graph. Each module is assigned an index, and all require calls in your app are rewritten from a module name to the module index assigned to that module. There’s also a magic file called UNBUNDLE in the js-modules folder that signals to React Native that your app bundle is an unbundle, and the app should configure itself to load modules on-demand.

Indexed Unbundle

This is the version of unbundle used by iOS by default, but can also be used on Android. On iOS, the cost of file IO outweighs the potential savings of loading JavaScript modules on-demand from assets, so the indexed unbundle solves this problem by putting all the contents of the unbundle in a single file. The file has a binary header that includes the module table, with information on the offset and length of each module in the file, and the information about the number of bytes in the startup code. The single file approach can be paged into RAM, so fewer costly disk IO operations are required.

Bytecode Bundle

To be quite honest, I haven’t done a lot of research on the byte-code bundle for React Native, but I understand the concept in general. Many JavaScript engines, such as those used in React Native, parse JavaScript into a bytecode, which can then either be interpreted by a virtual machine or JIT compiled into machine code. The concept of the bytecode bundle is that the JavaScript bundle is pre-parsed into the bytecode used by the JavaScript engine on the device. This saves valuable cycles at app startup time by eliminating the parsing step. In react-native-windows, we implemented a concept similar to the bytecode bundle that leverages ChakraCore’s JsSerializeScript and JsRunSerializedScriptWithCallback APIs.

Generating React Native Bundles

Generating bundles for React Native can be done with the react-native-cli.

To generate a plain JavaScript bundle:

  react-native bundle [options]
  builds the javascript bundle for offline use

  Options:

    -h, --help                         output usage information
    --entry-file <path>                Path to the root JS file, either absolute or relative to JS root
    --platform [string]                One of "ios", "android", or "windows"
    --transformer [string]             Specify a custom transformer to be used
    --dev [boolean]                    If false, warnings are disabled and the bundle is minified
    --bundle-output <string>           File name where to store the resulting bundle, ex. /tmp/groups.bundle
    --bundle-encoding [string]         Encoding the bundle should be written in (https://nodejs.org/api/buffer.html#buffer_buffer).
    --sourcemap-output [string]        File name where to store the sourcemap file for resulting bundle, ex. /tmp/groups.map
    --sourcemap-sources-root [string]  Path to make sourcemap's sources entries relative to, ex. /root/dir
    --assets-dest [string]             Directory name where to store assets referenced in the bundle
    --verbose                          Enables logging
    --reset-cache                      Removes cached files
    --read-global-cache                Try to fetch transformed JS code from the global cache, if configured.
    --config [string]                  Path to the CLI configuration file

The only options that are required are --entry-file, --platform, and --bundle-output. Most apps will also need to use --assets-dest if the app has any images, HTML files, or other assets that are referenced using require().

Generating an unbundle is no different than generating a bundle. The command line options are identical except for one additional option --indexed-unbundle.  Since an indexed unbundle is the default and only supported behavior for iOS, you would only use this flag if you wish to have an indexed unbundle on Android (or Windows!).

I would definitely recommend evaluating the performance of your app when using unbundle versus a standard bundle before committing to unbundle. For smaller apps with relatively few modules (say, a few hundred small modules), the bundle size may only be a few MBs, and the overhead of many more IO operations may outweigh the benefits of loading the entire bundle at app startup. You may also find that the penalty for IO occurs at an unacceptable point in your app, but there are many workarounds to ensure critical modules are loaded in advance from an unbundle.

Happy (un)bundling!

Unbundling React Native Unbundle

Bringing the F8 App to Windows with React Native

As some of you may already know, Microsoft is bringing React Native to the Universal Windows Platform. This is an exciting opportunity for React Native developers to reach over 270 million Windows 10 users across phone, Desktop, Xbox and even HoloLens. As part of the effort to bring React Native to Windows, and in partnership with Facebook, we published the F8 Developer Conference app to the Windows Store, using the now recently open sourced F8 codebase.  Here’s a video demonstrating some of the features used from React Native on UWP for the F8 app:

To be completely transparent about engineering effort, which is an important factor when choosing a framework like React Native, the effort to bring the F8 app to Windows took approximately three weeks for a team of three engineers focused exclusively on this app for 80% of their time. When we kicked off the effort, however, some of the core view managers and native modules for React Native on Windows were not available, and none of the third party dependencies had Windows support either. Specifically, there was no SplitView view manager for the menus and filters, no FlipView view manager for paging through the tabs and sessions, and we did not have properly functioning events for drag and content view updates in the ScrollViewer view manager. We also did not have a clipboard module for copy/paste of WiFi details; no asynchronous storage module for navigation state storage; no dialog module for logout and other alert behaviors; nor was there a launcher module for the linking behavior in the Info tab of the app. In terms of third party modules, we were missing the linear gradient view manager, the Facebook SDK module, and the React Native share module.  Some of these, like the launcher module, were half day efforts or less; other more complex modules, like the Facebook SDK module, took time to both discover the proper native API dependencies to consume and time to write and test, and took more than a day.

When it came to shipping the app on the store, there were a number of minor things we had not yet considered, like the fact that managed store apps must be compiled with .NET Native. We ended up being quite lucky, in that the only there were only a small number of mostly .NET APIs (primarily related to reflection) that were not supported in the app when compiled with .NET Native, and we simply had to work around those particular reflected operations.

There was a bit of design and style tweaking to make the F8 app look great on a Windows Phone device.  I won’t go into too many details here, as Facebook has outlined in great detail how platform customization works for React Native between Android and iOS, and the same principles apply to customization for Windows. Excluding all the work on core and third party module parity, and store preparation, there was certainly less than 1 week of 1 developers time dedicated to platform customization and style tweaks in JavaScript.  This is the time estimate that everyone should pay attention to, because in the fullness of time, React Native on UWP will reach feature parity with iOS and Android, and this will be the only effort that developers of cross-platform apps need to worry about. I’ve added a few examples below of how the Windows app is diverged from the iOS and Android apps.

Platform specific styles from the F8 ListContainer module.
var styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: 'white',
  },
  listView: {
    ios: {
      backgroundColor: 'transparent',
    },
    android: {
      backgroundColor: 'white',
    },
    windows: {
      backgroundColor: 'white',
    }
  },
  headerTitle: {
    color: 'white',
    fontWeight: 'bold',
    fontSize: 20,
  },
});
From F8TabsView.window.js …
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      &lt;F8SplitView
        ref=&quot;splitView&quot;
        paneWidth={290}
        panePosition=&quot;left&quot;
        renderPaneView={this.renderPaneView}&gt;
        &lt;View style={styles.content} key={this.props.tab}&gt;
          {this.renderContent()}
        &lt;/View&gt;
      &lt;/F8SplitView&gt;
    );
  }

  ...
}
compared to F8TabsView.android.js …
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      &lt;F8DrawerLayout
        ref=&quot;drawer&quot;
        drawerWidth={290}
        drawerPosition=&quot;left&quot;
        renderNavigationView={this.renderNavigationView}&gt;
        &lt;View style={styles.content} key={this.props.tab}&gt;
          {this.renderContent()}
        &lt;/View&gt;
      &lt;/F8DrawerLayout&gt;
    );
  }

  ...
}
and F8TabsView.ios.js
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      &lt;TabBarIOS tintColor={F8Colors.darkText}&gt;
        &lt;TabBarItemIOS&gt;
          ...
        &lt;/TabBarItemIOS&gt;
        ...
      &lt;/TabBarIOS&gt;
    );
  }

  ...
}

React Native aims to be a “horizontal platform” that is less about “write once, run everywhere” and more “learn once, write anywhere.” While we primarily designed the Windows version of the app around the Android user experience, given more time, we likely would have modified the views and menus to feel more like a Windows app. For example, in XAML, SplitView supports a compact display mode that shows only the icons from a pull out menu when closed. This would have been great for a desktop variant of the app and Continuum. Also in XAML, Pivot is commonly used for paging content, and having Pivot style headers for pages and sessions could have provided a more familiar experience for Windows users.

Overall, we had a very positive experience bringing the F8 Developer Conference app to Windows using React Native, and the experience for bringing your existing React Native apps to Windows is only going to get easier.  We hope that this effort shows that React Native on Windows is more than just an experiment, and with strong support from the community, it poses a great opportunity to reach a broader audience with your apps.

We’ll be talking about this experience and other stories related to bringing React Native to Windows at the DECODED Conference in Dublin, Ireland on May 13th.  Take a look at how another team at Microsoft was able to get CodePush working for React Native on UWP. Special thanks to Matt Podwysocki and Eero Bragge for all their hard work on getting the F8 Windows app ready in time for F8.

Bringing the F8 App to Windows with React Native

Reactive Extensions and Project Oxford for Cortana-like Speech Recognition Feedback

Project Oxford is a collection of APIs and SDKs from Microsoft that includes tools for transforming speech to text and text to speech.  Modern applications that leverage speech to text often display partial recognition results, to give the user immediate feedback, and reduce the overall perceived latency, as shown below in Cortana.

Cortana Screenshot
Partial response example for “Cortana, is it going to rain today?”.

Speech Recognition SDK Overview

The Windows SDK for speech recognition in Project Oxford (which can be downloaded at https://www.projectoxford.ai/SDK) includes the ability to capture and display partial results. The API uses C# events to notify the client of everything from partial results, to recognition errors, and the final recognized result. The SDK can be used to either capture microphone input directly, or accept an audio stream in chunks, as shown below.

                // Capture microphone input
                client.AudioStart();
                client.AudioStop();

                // Push audio stream
                Stream stream;
                var count = default(int);
                var buffer = new byte[1024];
                while ((count = stream.Read(buffer, 0, 1024)) > 0)
                {
                    client.SendAudio(buffer, count);
                }
                client.EndAudio();

In either case, the handler logic is the same. Here is a very simple example that captures the events and prints them to a console window.

                client.OnConversationError += (sender, args) =>
                {
                    Console.WriteLine("Error {0}, {1}", args.SpeechErrorCode, args.SpeechErrorText);
                };

                client.OnPartialResponseReceived += (sender, args) =>
                {
                    Console.WriteLine("Received partial response: {0}", args.PartialResult);
                };

                client.OnResponseReceived += (sender, args) =>
                {
                    switch (args.PhraseResponse.RecognitionStatus)
                    {
                        case RecognitionStatus.Intermediate:
                            Console.WriteLine("Received intermediate response: {0}", args.PhraseResponse.Results.First().DisplayText);
                            break;
                        case RecognitionStatus.RecognitionSuccess:
                            Console.WriteLine("Received success response: {0}", args.PhraseResponse.Results.First().DisplayText);
                            break;
                        case RecognitionStatus.NoMatch:
                        case RecognitionStatus.None:
                        case RecognitionStatus.InitialSilenceTimeout:
                        case RecognitionStatus.BabbleTimeout:
                        case RecognitionStatus.HotWordMaximumTime:
                        case RecognitionStatus.Cancelled:
                        case RecognitionStatus.RecognitionError:
                        case RecognitionStatus.DictationEndSilenceTimeout:
                        case RecognitionStatus.EndOfDictation:
                        default:
                            Console.WriteLine("Received {0} response.", args.PhraseResponse.RecognitionStatus);
                            break;
                    }
                };

There are two modes for speech recognition supported by the SDK, short phrase and long dictation. The former is designed for single-shot utterances such as commands or queries, and the latter is more for capturing longer sessions, such as email or text message dictation. Here is a summary of the kinds of events and status codes I was able to produce “in the wild” (i.e., by babbling at my laptop):

Response Type Short Phrase Long Dictation
OnPartialResponseReceived Y Y
OnConversationError Y Y
OnResponseReceived
    None (0) N N
    Intermediate (100) N N
    RecognitionSuccess (200) Y Y
    Cancelled (201) N N
    NoMatch (301) Y Y
    InitialSilenceTimeout (303) Y Y
    BabbleTimeout (304) N N
    HotWordMaximumTime (305) N N
    RecognitionError (500) N N
    DictationEndSilenceTimeout (610) N Y
    EndOfDictation (612) N Y

The long dictation mode typically consists of a series of partial speech responses terminated by a regular response.  For example, if the user spoke “Four score and seven years ago… our fathers brought forth on this continent, a new nation…”, the event handling logic above would produce something similar to the following output:

Received partial result: four
Received partial result: four score and
Received partial result: four score and seven years ago
Received success result: Four score and seven years ago.
Received partial result: our fathers brought
Received partial result: our fathers brought forth on this continent
Received partial result: our fathers brought forth on this continent a new nation
Received partial result: Our fathers brought forth on this continent a new nation.

The short phrase mode is similar, except that it will only return a single response, so any utterances made after the first response are ignored.

Speech Recognition With Reactive Extensions

The trouble with an event-driven approach to speech recognition handling is that by decoupling the events, you also lose some of the semantics of their sequencing. That is to say that event handlers are assigned by event type, rather than event order.  So, if you wanted to introduce logic that had special handling based on the sequence of recognition results, some kind of shared state accessible to each of the event handlers would be required.

Consider, for example, a long dictation mode scenario where the first pause corresponded to the title of a dictated blog post. The user might say, “A Blog Post About My Cat [pause] My cat is the greatest cat because it has orange fur. [pause] She is also afraid of vacuum cleaners and loves laser pointers.” Here is some sample code that implements this with partial feedback on both the title and the sentences:

            var count = 0;

            client.OnConversationError += (sender, args) =>
            {
                Console.Error.WriteLine("Failed with code '{0}' and text '{1}'.", args.SpeechErrorCode, args.SpeechErrorText);
            };

            client.OnPartialResponseReceived += (sender, args) =>
            {
                Console.CursorLeft = 0;
                var prefix = (count == 0) ? "Title" : "Sentence " + count;
                Console.Write("{0}: {1}", prefix, args.PartialResult);
            };

            client.OnResponseReceived += (sender, args) =>
            {
                if (args.PhraseResponse.RecognitionStatus == RecognitionStatus.RecognitionSuccess)
                {
                    var result = args.PhraseResponse.Results.First().DisplayText;
                    Console.CursorLeft = 0;
                    var prefix = (count == 0) ? "Title" : "Sentence " + count;
                    Console.WriteLine("{0}: {1}", prefix, result);
                    count++;
                }
            };

Notice that in order to implement this scenario, we introduces the shared state, `isTitleSet`, and switched on that shared state.

However, another option for modeling these sequences of partial results are using the observable abstraction from the Reactive Extensions (Rx) framework.  Specifically, each partial or final response would be modeled as an `OnNext` event, and the final response would be terminated with an `OnCompleted` event.  In the case of long dictation mode, the series of partial-followed-by-regular-responses would be modeled as an observable of observables, or IObservable<IObservable<RecognizedPhrase>>.

So, for the blog post dictation example above, here’s some logic using Rx:

            var sentenceSubscriptions = client.GetResponseObservable()
                .Select((observable, count) => new { observable, count })
                .Subscribe(
                    x => x.observable.Subscribe(
                        phrases =>
                        {
                            Console.CursorLeft = 0;
                            var firstPhrase = phrases.First();
                            var prefix = x.count == 0 ? "Title" : "Sentence " + x.count;
                            Console.Write("{0}: {1}", prefix, firstPhrase.DisplayText ?? firstPhrase.LexicalForm);
                        },
                        ex => Console.Error.WriteLine(ex),
                        () => Console.WriteLine()));

For those very familiar with Rx, all the logic to dispose subscriptions is left out of this example (sorry!), in the same way that the logic to “subtract” the event handlers from the previous example is left out.

Beyond introducing more explicit semantics for the event sequences that occur in the Project Oxford speech recognition APIs, using Reactive Extensions here allows users to write code with LINQ syntax, and also takes care of cleaning up all the event handlers on the client after you are no longer using them (assuming you dispose your subscriptions!).

Implementing the Speech Recognition Observable

The last example uses an extension method on the Project Oxford client with the following signature:

        public static IObservable<IObservable<RecognizedPhrase>> GetResponseObservable(this DataRecognitionClient client);

However, this was primarily to simplify the example. In reality, Project Oxford returns a set of candidates for what the utterances may be, so the signature would look like:

        public static IObservable<IObservable<IEnumerable<RecognizedPhrase>>> GetResponseObservable(this DataRecognitionClient client);

The implementation of this is rather simple. Using the latest bits from Rx.NET, this implementation is little more than a combination of the FromEventPattern, Merge, and Window operators.  Here’s the specific implementation:

        public static IObservable<IObservable<IEnumerable<RecognizedPhrase>>> GetResponseObservable(this DataRecognitionClient client)
        {
            var errorObservable = Observable.FromEventPattern<SpeechErrorEventArgs>(
                    h => client.OnConversationError += h,
                    h => client.OnConversationError -= h)
                .Select<EventPattern<MicrosoftProjectOxford.SpeechErrorEventArgs>, IEnumerable<RecognizedPhrase>>(
                    x => { throw new SpeechRecognitionException(x.EventArgs.SpeechErrorCode, x.EventArgs.SpeechErrorText); });

            var partialObservable = Observable.FromEventPattern<PartialSpeechResponseEventArgs>(
                    h => client.OnPartialResponseReceived += h,
                    h => client.OnPartialResponseReceived -= h)
                .Select(x => Enumerable.Repeat(RecognizedPhrase.CreatePartial(x.EventArgs.PartialResult), 1));

            var responseObservable = Observable.FromEventPattern<SpeechResponseEventArgs>(
                    h => client.OnResponseReceived += h,
                    h => client.OnResponseReceived -= h)
                .Select(x =>
                {
                    var response = x.EventArgs.PhraseResponse;
                    switch (response.RecognitionStatus)
                    {
                        case RecognitionStatus.Intermediate:
                            return response.Results.Select(p => RecognizedPhrase.CreateIntermediate(p));
                        case RecognitionStatus.RecognitionSuccess:
                            return response.Results.Select(p => RecognizedPhrase.CreateSuccess(p));
                        case RecognitionStatus.InitialSilenceTimeout:
                            throw new InitialSilenceTimeoutException();
                        case RecognitionStatus.BabbleTimeout:
                            throw new BabbleTimeoutException();
                        case RecognitionStatus.Cancelled:
                            throw new OperationCanceledException();
                        case MicrosoftProjectOxford.RecognitionStatus.DictationEndSilenceTimeout:
                            throw new DictationEndTimeoutException();
                        case RecognitionStatus.EndOfDictation:
                        case RecognitionStatus.HotWordMaximumTime:
                        case RecognitionStatus.NoMatch:
                        case RecognitionStatus.None:
                        case RecognitionStatus.RecognitionError:
                        default:
                            throw new SpeechRecognitionException();
                    }
                });

            return responseObservable.Publish(observable =>
                Observable.Merge(errorObservable, partialObservable, observable)
                    .Window(() => observable));
        }

In addition to the core logic above, a few data models were introduced, including the exception types for errors and timeouts, as well a replacement class for RecognizedPhrase that was able to represent both success responses and partial responses. For the full implementation, check out my GitHub repository, RxToProjectOxford.

Reactive Extensions and Project Oxford for Cortana-like Speech Recognition Feedback