Protocol

High Level Interaction Overview

In this document we use the word Client interchangeably with the phrase "Object Host" when referring to a 3rd party connecting to a space server to become part of that space. Traditionally Clients host a single object: the camera, which maintains a perspective query of the surrounding environment, but this is left up to the client implementation and the policies in the space: you may find clients hosting avatars, bots, or other objects in a given world.

Object hosts simulate individual objects and facilitate these objects to each connect to one or more spaces. Once an object connects to a space, it may communicate with that space to setup standing queries for nearby or important objects, advertise its position or send messages to objects for which it knows the identifier.

A typical client behavior involves the following steps

Connect camera object to space server
Upon connection, register for a new object Id with desired position
Setup a proximity query for nearby objects that occupy N pixel area on its screen

At this point the camera object will be ready to receive standing query results of nearby objects. As these come in, the client will likely wish to act upon them.

As a proximity request is returned the client is likely to perform the following steps

Examine the proximity response and understand any bundled data (which may include current location, mesh data, type, etc)
Construct a message asking that object for remaining data including
- Mesh URI
- Position
- Light information
- ...

Once the Data is retrieved, the object may be displayed on the client. In the future the client may setup an individual subscription for location updates and other properties of interest with the relevant subscription service. The subscription service is optional and the client must not rely on such a service being present. In the absence of such a service, asking the location service for location updates at a rate proportional to the distance^2 of the object is sufficient.

Clients may wish to interact with objects in the scene (asking them to move, changing their properties, etc). This is accomplished through sending a message to that object on a number of ports. The standard message goes to the object on port 0 and invokes a scripting language function on that object by that name with the provided binary string argument.

Low Level Network Protocols

Sirikata is planned to support 2 low level protocols

TCP under TCPSST
UDP under ENET

All components of the system should be able to speak both protocols so the best may be selected for the given circumstance. (Often times ISPs will packet shape UDP and users will get better performance from TCP, but ideally UDP has less overhead than TCP and allows the application to drop outdated packets)

TCPSST

TCPSST is an effort to build a structured streams-like abstraction ontop of TCP by using a handful of individual TCP streams over which to multiplex a number of independent SubStreams. Clients may choose to connect to servers with one or more TCP streams using a handshake that pairs them up appropriately. The default is 3 TCP streams per TCPSST connection. This setup prevents head of line blocking and allows unordered messages to be sent on the least congested stream.

Handshake

every stream begins with the 6 characters SSTTCP then 2 characters delineating in human readable how many streams will be associated with this connection (up to 99) and then a UUID for this connection. This data will be used by the server to pair up the connections and treat them as a single overarching TCPSST stream. The server then sends back a similar handshake header with its own unique UUID to acknowledge the connection pairing.

Framing

After the handshake is complete, data may be sent across the bidirectional sockets as long as each packet of data is preceeded by a packet size (including all following data including the substream identifier) and a substream identifier

Packet size format

The packet size should be a Base 128 VarInt as defined in the protocol buffers wire format. http://code.google.com/apis/protocolbuffers/docs/encoding.html

Substream identifier format

The packet size should be a Base 128 VarInt as defined in the protocol buffers wire format. http://code.google.com/apis/protocolbuffers/docs/encoding.html Substream 0 is reserved for protocol messages.

Application messages

Application messages consist of a positive number of bytes followed encased in the Framing information above. By convention, all messages from a certain stream identifier are sent along a single chosen TCP socket unless the application code specifically marks the unordered option. New streams may be allocated simply by using an unused stream identifier. New streams allocated by the client who initiated the connection should be odd, and new streams allocated by the server who listened for the connection should be even.

Protocol messages

Protocol messages consist of the standard PacketSize followed by the byte 0 which happens to be the VarInt representation for streamid 0 and one of two types of protocol level traffic currently supported

Closing the stream: byte with the value 0x1 followed by the VarInt streamid to close
- Close should be sent on all TCPStreams that make up the overarching connection to avoid any unordered packets being delivered late.
Acking the closing of a stream: byte with the value 0x2 followed by the VarInt streamid
- An ack should be sent for each close command. Once the ack is received, the streamid may be reused.

Sample Implementation

Sirikata provides a sample C++ implementation of the TCPSST protocol in libcore/plugins/tcpsst This implementation is built atop boost::asio and for the most part uses lock free datastructures or data structures that may be replaced with lock free implementations, so it should be scalable and thread safe.

ENET

The enet module is not yet written, but enet provides a multichannel abstraction over a socket with the additional possibility of a lossy option and portions of a drop-oldest facility. A similar protocol will likely be used with enet, but this is still in the planning stages.

Application Message Serialization

Routing header

The beginning of an application message should start with a header for message routing. The header consists of a set of message fields with id's. These message fields have ID's between 1 and 6 (inclusive) and 1536 and 1791, stored in numerical order.

The message fields are individually serialized as protocol buffers fields http://code.google.com/apis/protocolbuffers/docs/encoding.html

The currently defined fields are 1) source object, stored as a variable length array of up to 16 bytes, zero padded 2) destination object, stored as a variable length array of up to 16 bytes, zero padded 3) source port, stored as a base 128 variable length up to 32 bit integer 4) destination port, stored as a base 128 variable length up to 32 bit integer 1536) [extension] source space, stored as a variable length array of up to 16 bytes 1537) [extension] destination space, stored as a variable length array of up to 16 bytes

RPC header [optional]

The rpc header follows the routing header and defines fields in the range from 7 to 8 and from 1792 to 2560 Currently defined fields are 7) message id (uint64) 8) message response id (uint64) indicating one reply to a given message 1792) Return Status (enum)

Once a field outside the Routing and RPC headers is encountered, the header parsing is halted and the rest of the message is considered an user level message.

Sending a message id gives the other side the chance to reply in turn with a matching message response id. More than one reply may be matched with a particular message id (as in the case of a standing query, or a persistence request of which some values were in a cache and others needed fetching)

User Message

The user portion of the message is considered a PBJ message. By default the message is a protocol buffer encoded message. However, if the type of the first byte encountered is an illegal Protocol buffer type, then that byte switches PBJ into one of its advanced modes (for thrift or MXP) [currently unimplemented].