Bringing the F8 App to Windows with React Native

As some of you may already know, Microsoft is bringing React Native to the Universal Windows Platform. This is an exciting opportunity for React Native developers to reach over 270 million Windows 10 users across phone, Desktop, Xbox and even HoloLens. As part of the effort to bring React Native to Windows, and in partnership with Facebook, we published the F8 Developer Conference app to the Windows Store, using the now recently open sourced F8 codebase.  Here’s a video demonstrating some of the features used from React Native on UWP for the F8 app:

To be completely transparent about engineering effort, which is an important factor when choosing a framework like React Native, the effort to bring the F8 app to Windows took approximately three weeks for a team of three engineers focused exclusively on this app for 80% of their time. When we kicked off the effort, however, some of the core view managers and native modules for React Native on Windows were not available, and none of the third party dependencies had Windows support either. Specifically, there was no SplitView view manager for the menus and filters, no FlipView view manager for paging through the tabs and sessions, and we did not have properly functioning events for drag and content view updates in the ScrollViewer view manager. We also did not have a clipboard module for copy/paste of WiFi details; no asynchronous storage module for navigation state storage; no dialog module for logout and other alert behaviors; nor was there a launcher module for the linking behavior in the Info tab of the app. In terms of third party modules, we were missing the linear gradient view manager, the Facebook SDK module, and the React Native share module.  Some of these, like the launcher module, were half day efforts or less; other more complex modules, like the Facebook SDK module, took time to both discover the proper native API dependencies to consume and time to write and test, and took more than a day.

When it came to shipping the app on the store, there were a number of minor things we had not yet considered, like the fact that managed store apps must be compiled with .NET Native. We ended up being quite lucky, in that the only there were only a small number of mostly .NET APIs (primarily related to reflection) that were not supported in the app when compiled with .NET Native, and we simply had to work around those particular reflected operations.

There was a bit of design and style tweaking to make the F8 app look great on a Windows Phone device.  I won’t go into too many details here, as Facebook has outlined in great detail how platform customization works for React Native between Android and iOS, and the same principles apply to customization for Windows. Excluding all the work on core and third party module parity, and store preparation, there was certainly less than 1 week of 1 developers time dedicated to platform customization and style tweaks in JavaScript.  This is the time estimate that everyone should pay attention to, because in the fullness of time, React Native on UWP will reach feature parity with iOS and Android, and this will be the only effort that developers of cross-platform apps need to worry about. I’ve added a few examples below of how the Windows app is diverged from the iOS and Android apps.

Platform specific styles from the F8 ListContainer module.
var styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: 'white',
  },
  listView: {
    ios: {
      backgroundColor: 'transparent',
    },
    android: {
      backgroundColor: 'white',
    },
    windows: {
      backgroundColor: 'white',
    }
  },
  headerTitle: {
    color: 'white',
    fontWeight: 'bold',
    fontSize: 20,
  },
});
From F8TabsView.window.js …
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      <F8SplitView
        ref="splitView"
        paneWidth={290}
        panePosition="left"
        renderPaneView={this.renderPaneView}>
        <View style={styles.content} key={this.props.tab}>
          {this.renderContent()}
        </View>
      </F8SplitView>
    );
  }

  ...
}
compared to F8TabsView.android.js …
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      <F8DrawerLayout
        ref="drawer"
        drawerWidth={290}
        drawerPosition="left"
        renderNavigationView={this.renderNavigationView}>
        <View style={styles.content} key={this.props.tab}>
          {this.renderContent()}
        </View>
      </F8DrawerLayout>
    );
  }

  ...
}
and F8TabsView.ios.js
class F8TabsView extends React.Component {
  ...

  render() {
    return (
      <TabBarIOS tintColor={F8Colors.darkText}>
        <TabBarItemIOS>
          ...
        </TabBarItemIOS>
        ...
      </TabBarIOS>
    );
  }

  ...
}

React Native aims to be a “horizontal platform” that is less about “write once, run everywhere” and more “learn once, write anywhere.” While we primarily designed the Windows version of the app around the Android user experience, given more time, we likely would have modified the views and menus to feel more like a Windows app. For example, in XAML, SplitView supports a compact display mode that shows only the icons from a pull out menu when closed. This would have been great for a desktop variant of the app and Continuum. Also in XAML, Pivot is commonly used for paging content, and having Pivot style headers for pages and sessions could have provided a more familiar experience for Windows users.

Overall, we had a very positive experience bringing the F8 Developer Conference app to Windows using React Native, and the experience for bringing your existing React Native apps to Windows is only going to get easier.  We hope that this effort shows that React Native on Windows is more than just an experiment, and with strong support from the community, it poses a great opportunity to reach a broader audience with your apps.

We’ll be talking about this experience and other stories related to bringing React Native to Windows at the DECODED Conference in Dublin, Ireland on May 13th.  Take a look at how another team at Microsoft was able to get CodePush working for React Native on UWP. Special thanks to Matt Podwysocki and Eero Bragge for all their hard work on getting the F8 Windows app ready in time for F8.

Bringing the F8 App to Windows with React Native

Reactive Extensions and Project Oxford for Cortana-like Speech Recognition Feedback

Project Oxford is a collection of APIs and SDKs from Microsoft that includes tools for transforming speech to text and text to speech.  Modern applications that leverage speech to text often display partial recognition results, to give the user immediate feedback, and reduce the overall perceived latency, as shown below in Cortana.

Cortana Screenshot
Partial response example for “Cortana, is it going to rain today?”.

Speech Recognition SDK Overview

The Windows SDK for speech recognition in Project Oxford (which can be downloaded at https://www.projectoxford.ai/SDK) includes the ability to capture and display partial results. The API uses C# events to notify the client of everything from partial results, to recognition errors, and the final recognized result. The SDK can be used to either capture microphone input directly, or accept an audio stream in chunks, as shown below.

                // Capture microphone input
                client.AudioStart();
                client.AudioStop();

                // Push audio stream
                Stream stream;
                var count = default(int);
                var buffer = new byte[1024];
                while ((count = stream.Read(buffer, 0, 1024)) > 0)
                {
                    client.SendAudio(buffer, count);
                }
                client.EndAudio();

In either case, the handler logic is the same. Here is a very simple example that captures the events and prints them to a console window.

                client.OnConversationError += (sender, args) =>
                {
                    Console.WriteLine("Error {0}, {1}", args.SpeechErrorCode, args.SpeechErrorText);
                };

                client.OnPartialResponseReceived += (sender, args) =>
                {
                    Console.WriteLine("Received partial response: {0}", args.PartialResult);
                };

                client.OnResponseReceived += (sender, args) =>
                {
                    switch (args.PhraseResponse.RecognitionStatus)
                    {
                        case RecognitionStatus.Intermediate:
                            Console.WriteLine("Received intermediate response: {0}", args.PhraseResponse.Results.First().DisplayText);
                            break;
                        case RecognitionStatus.RecognitionSuccess:
                            Console.WriteLine("Received success response: {0}", args.PhraseResponse.Results.First().DisplayText);
                            break;
                        case RecognitionStatus.NoMatch:
                        case RecognitionStatus.None:
                        case RecognitionStatus.InitialSilenceTimeout:
                        case RecognitionStatus.BabbleTimeout:
                        case RecognitionStatus.HotWordMaximumTime:
                        case RecognitionStatus.Cancelled:
                        case RecognitionStatus.RecognitionError:
                        case RecognitionStatus.DictationEndSilenceTimeout:
                        case RecognitionStatus.EndOfDictation:
                        default:
                            Console.WriteLine("Received {0} response.", args.PhraseResponse.RecognitionStatus);
                            break;
                    }
                };

There are two modes for speech recognition supported by the SDK, short phrase and long dictation. The former is designed for single-shot utterances such as commands or queries, and the latter is more for capturing longer sessions, such as email or text message dictation. Here is a summary of the kinds of events and status codes I was able to produce “in the wild” (i.e., by babbling at my laptop):

Response Type Short Phrase Long Dictation
OnPartialResponseReceived Y Y
OnConversationError Y Y
OnResponseReceived
    None (0) N N
    Intermediate (100) N N
    RecognitionSuccess (200) Y Y
    Cancelled (201) N N
    NoMatch (301) Y Y
    InitialSilenceTimeout (303) Y Y
    BabbleTimeout (304) N N
    HotWordMaximumTime (305) N N
    RecognitionError (500) N N
    DictationEndSilenceTimeout (610) N Y
    EndOfDictation (612) N Y

The long dictation mode typically consists of a series of partial speech responses terminated by a regular response.  For example, if the user spoke “Four score and seven years ago… our fathers brought forth on this continent, a new nation…”, the event handling logic above would produce something similar to the following output:

Received partial result: four
Received partial result: four score and
Received partial result: four score and seven years ago
Received success result: Four score and seven years ago.
Received partial result: our fathers brought
Received partial result: our fathers brought forth on this continent
Received partial result: our fathers brought forth on this continent a new nation
Received partial result: Our fathers brought forth on this continent a new nation.

The short phrase mode is similar, except that it will only return a single response, so any utterances made after the first response are ignored.

Speech Recognition With Reactive Extensions

The trouble with an event-driven approach to speech recognition handling is that by decoupling the events, you also lose some of the semantics of their sequencing. That is to say that event handlers are assigned by event type, rather than event order.  So, if you wanted to introduce logic that had special handling based on the sequence of recognition results, some kind of shared state accessible to each of the event handlers would be required.

Consider, for example, a long dictation mode scenario where the first pause corresponded to the title of a dictated blog post. The user might say, “A Blog Post About My Cat [pause] My cat is the greatest cat because it has orange fur. [pause] She is also afraid of vacuum cleaners and loves laser pointers.” Here is some sample code that implements this with partial feedback on both the title and the sentences:

            var count = 0;

            client.OnConversationError += (sender, args) =>
            {
                Console.Error.WriteLine("Failed with code '{0}' and text '{1}'.", args.SpeechErrorCode, args.SpeechErrorText);
            };

            client.OnPartialResponseReceived += (sender, args) =>
            {
                Console.CursorLeft = 0;
                var prefix = (count == 0) ? "Title" : "Sentence " + count;
                Console.Write("{0}: {1}", prefix, args.PartialResult);
            };

            client.OnResponseReceived += (sender, args) =>
            {
                if (args.PhraseResponse.RecognitionStatus == RecognitionStatus.RecognitionSuccess)
                {
                    var result = args.PhraseResponse.Results.First().DisplayText;
                    Console.CursorLeft = 0;
                    var prefix = (count == 0) ? "Title" : "Sentence " + count;
                    Console.WriteLine("{0}: {1}", prefix, result);
                    count++;
                }
            };

Notice that in order to implement this scenario, we introduces the shared state, `isTitleSet`, and switched on that shared state.

However, another option for modeling these sequences of partial results are using the observable abstraction from the Reactive Extensions (Rx) framework.  Specifically, each partial or final response would be modeled as an `OnNext` event, and the final response would be terminated with an `OnCompleted` event.  In the case of long dictation mode, the series of partial-followed-by-regular-responses would be modeled as an observable of observables, or IObservable<IObservable<RecognizedPhrase>>.

So, for the blog post dictation example above, here’s some logic using Rx:

            var sentenceSubscriptions = client.GetResponseObservable()
                .Select((observable, count) => new { observable, count })
                .Subscribe(
                    x => x.observable.Subscribe(
                        phrases =>
                        {
                            Console.CursorLeft = 0;
                            var firstPhrase = phrases.First();
                            var prefix = x.count == 0 ? "Title" : "Sentence " + x.count;
                            Console.Write("{0}: {1}", prefix, firstPhrase.DisplayText ?? firstPhrase.LexicalForm);
                        },
                        ex => Console.Error.WriteLine(ex),
                        () => Console.WriteLine()));

For those very familiar with Rx, all the logic to dispose subscriptions is left out of this example (sorry!), in the same way that the logic to “subtract” the event handlers from the previous example is left out.

Beyond introducing more explicit semantics for the event sequences that occur in the Project Oxford speech recognition APIs, using Reactive Extensions here allows users to write code with LINQ syntax, and also takes care of cleaning up all the event handlers on the client after you are no longer using them (assuming you dispose your subscriptions!).

Implementing the Speech Recognition Observable

The last example uses an extension method on the Project Oxford client with the following signature:

        public static IObservable<IObservable<RecognizedPhrase>> GetResponseObservable(this DataRecognitionClient client);

However, this was primarily to simplify the example. In reality, Project Oxford returns a set of candidates for what the utterances may be, so the signature would look like:

        public static IObservable<IObservable<IEnumerable<RecognizedPhrase>>> GetResponseObservable(this DataRecognitionClient client);

The implementation of this is rather simple. Using the latest bits from Rx.NET, this implementation is little more than a combination of the FromEventPattern, Merge, and Window operators.  Here’s the specific implementation:

        public static IObservable<IObservable<IEnumerable<RecognizedPhrase>>> GetResponseObservable(this DataRecognitionClient client)
        {
            var errorObservable = Observable.FromEventPattern<SpeechErrorEventArgs>(
                    h => client.OnConversationError += h,
                    h => client.OnConversationError -= h)
                .Select<EventPattern<MicrosoftProjectOxford.SpeechErrorEventArgs>, IEnumerable<RecognizedPhrase>>(
                    x => { throw new SpeechRecognitionException(x.EventArgs.SpeechErrorCode, x.EventArgs.SpeechErrorText); });

            var partialObservable = Observable.FromEventPattern<PartialSpeechResponseEventArgs>(
                    h => client.OnPartialResponseReceived += h,
                    h => client.OnPartialResponseReceived -= h)
                .Select(x => Enumerable.Repeat(RecognizedPhrase.CreatePartial(x.EventArgs.PartialResult), 1));

            var responseObservable = Observable.FromEventPattern<SpeechResponseEventArgs>(
                    h => client.OnResponseReceived += h,
                    h => client.OnResponseReceived -= h)
                .Select(x =>
                {
                    var response = x.EventArgs.PhraseResponse;
                    switch (response.RecognitionStatus)
                    {
                        case RecognitionStatus.Intermediate:
                            return response.Results.Select(p => RecognizedPhrase.CreateIntermediate(p));
                        case RecognitionStatus.RecognitionSuccess:
                            return response.Results.Select(p => RecognizedPhrase.CreateSuccess(p));
                        case RecognitionStatus.InitialSilenceTimeout:
                            throw new InitialSilenceTimeoutException();
                        case RecognitionStatus.BabbleTimeout:
                            throw new BabbleTimeoutException();
                        case RecognitionStatus.Cancelled:
                            throw new OperationCanceledException();
                        case MicrosoftProjectOxford.RecognitionStatus.DictationEndSilenceTimeout:
                            throw new DictationEndTimeoutException();
                        case RecognitionStatus.EndOfDictation:
                        case RecognitionStatus.HotWordMaximumTime:
                        case RecognitionStatus.NoMatch:
                        case RecognitionStatus.None:
                        case RecognitionStatus.RecognitionError:
                        default:
                            throw new SpeechRecognitionException();
                    }
                });

            return responseObservable.Publish(observable =>
                Observable.Merge(errorObservable, partialObservable, observable)
                    .Window(() => observable));
        }

In addition to the core logic above, a few data models were introduced, including the exception types for errors and timeouts, as well a replacement class for RecognizedPhrase that was able to represent both success responses and partial responses. For the full implementation, check out my GitHub repository, RxToProjectOxford.

Reactive Extensions and Project Oxford for Cortana-like Speech Recognition Feedback