Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products, Solutions & Roles
    • Products
      • InsightEdge Portfolio
        • Smart Cache
        • Smart ODS
        • Smart Augmented Transactions
        • Compare InsightEdge Products
      • GigaSpaces Cloud
    • Solutions
      • Industry
        • Financial Services
        • Insurance
        • Retail and eCommerce
        • Telecommunications
        • Transportations
      • Technical
        • Operational BI
        • Mainframe & AS/400 Modernization
        • In Memory Data Grid
        • Transactional and Analytical Processing (HTAP)
        • Hybrid Cloud Data Fabric
        • Multi-Tiered Storage
        • Kubernetes Deployment
        • Streaming Analytics for Stateful Apps
    • Building a Successful Hybrid and Multicloud Strategy
      vid-icon Guide

      Learn how to build and deploy a successful hybrid and multicloud strategy to achieve: agility and scalability, faster go-to-market, application acceleration, legacy modernization, and more.

      DOWNLOAD
    • Contact
    • Try Free
  • Resources
    • Resource Hub
      • Webinars
      • Demos
      • Solution Briefs & Whitepapers
      • Case Studies
      • Benchmarks
      • ROI Calculators
      • Analyst Reports
      • eBooks
    • Featured Case Studies
      • Mainframe Offload with Groupe PSA
      • Digital Transformation with Avanza Bank
      • High Peak Handling with PriceRunner
      • Optimizing Business Communications with Avaya
    • col3
      • Blog
      • Technical Documentation
    • Live Webinar: Enable Digital Transformation With a High Performing Data Platform
      article-icon Live Webinar | March 11 - 9 AM EST/3 PM CET

      Join Capgemini and GigaSpaces for a discussion on the latest modernization trends for enterprises that are embarking on digital and business transformation

      REGISTER NOW
    • Contact Us
    • Try Free
  • Company
    • Col1
      • About
      • Customers
      • Management
      • Board Members
      • Investors
      • Events
      • News
      • Careers
    • Col2
      • Partners
      • OEM
      • System Integrators
      • Technology Partners
      • Value Added Resellers
    • Col3
      • Support & Services
      • University
      • Services
      • Support
    • GigaSpaces is Headed to CDAO 2021
      article-icon Event | March 2-4

      Join us at CDAO 2021, the premier Virtual Summit for Data and Analytics Leaders. We'll be moderating "Transforming Financial Services to a Customer-Centric Business", alongside USAA Bank, Regions Bank, and Capital Group.

      SIGN UP NOW
    • Contact Us
    • Try Free
  • Search
  • Contact Us
  • Try Free

Building a Low Latency Highly Scalable Data Serialization Protocol

Subscribe to our blog!

Subscribe for Updates
Close
Back

Building a Low Latency Highly Scalable Data Serialization Protocol

Barak Bar-Orion June 15, 2016
9 minutes read

At the heart of every big data solution, such as XAP, is data synchronization. While there are a number of solutions and approaches available today, they are either overly complex or incredibly inefficient. Fortunately, we have been able to combine a number of these protocols to achieve what is a simple, yet incredibly efficient data synchronization framework.
The Popular Data Sync Technologies of Today
Programmers have a handful of data synchronization technologies available. Below we will explore the many pros and cons of these different solutions, starting with what is arguably the most popular: Oracle RMI.
Oracle RMI
The Java  RMI (or Remote Method Invocation) allows programmers to create distributed Java technology based to Java technology based applications. This allows users to invoke remote Java objects from other virtual machines (and sometimes from different hosts).
Why It Works

  • Java RMI offers a number of benefits, such as:
  • Looking at it as a process, there is no need for pre-compilation stage like with CORBA or Google Protobuf
  • The program seamlessly integrates into Java (users can send a combination of serializable and RMI exported objects)
  • It support dynamic code loading, that is the class definition is transferred when needed between the client and the server
  • The framework is well known and is easy for Java programmers to use
  • The framework performs distributed garbage collection (i.e. if a user allocates an object on the server and you pass reference to it to the client, the JVM will keep the object alive on the server until the last client disappears. The object will be released once no client references it)

 
Why It’s Not Ideal
Java  RMI does come with a few notable drawbacks:

  • It is not built with NIO, and thus support only the thread per request model, that is not scalable, furthermore, every expected object open its own server socket which is waste of machine resources.
  • It is tied to Java serialization, meaning it is not payload agnostic
  • The API is blocking (i.e. if you do a remote call, the calling client thread will be blocked until it gets a result)
  • Java serialization does not support data or extension. This differs from XML or JSON, which just ignore new fields
  • It is difficult to configure if a firewall is in place
  • It uses dynamic port binding on the server side, which makes it non-ideal to deploy on the cloud (where explicit security group rules are needed) or put into a Docker container.

 
REST
Unlike RPC which is more action focused, REST is about resources. Through REST, users only reference the resource in the URL and then the user can define what to do with that resource through the use of HTTP and the body of the request. It is a great tool for public facing APIs as they do not require a ton of pre-existing knowledge about the service to be used effectively.
Why It Works
Programmers use REST for a variety of reasons:

  • REST is built on top of HTTP (i.e. users can use all of the HTTP headers for caching, redirects, and standard error codes)
  • It is firewall friendly
  • There are numerous tools available for viewing and debugging REST communication
  • It is very easy to debug
  • It can be triggered easily from any programming language and environment (even directly from the browser)
  • It is payload agnostic (it is not opinionated about how to encode the payload other than HTTP limitations)

 
Why It’s Not Ideal
There are a few reasons why REST may not work for a user:

  • It is built on top of HTTP (which is relatively slow and verbose) and does not support pipelining
  • The non-blocking API is only a thin wrapper on top of blocking API—this is an inheritance issue and not specific to some frameworks. It stems from the fact that REST is based on HTTP and http does not support

 
Apache Thrift
Apache Thrift is another protocol and framework which is used as a foundation for RPC.  It was originally developed at Facebook for “scalable cross-language services development.” It is now being developed under the Apache Software Foundation and is used among others by Cloudera and last.fm.
Why It Works
Apache Thrift delivers a handful of benefits, including:

  • It consistently generates both the client interfaces and the server for a given service (this means that client calls will generally be less prone to error)
  • The framework supports a variety of protocols and not just HTTP
  • It is a well-tested and widely used piece of software
  • It is portable across programming languages and has bindings to all major languages
  • It’s binary and therefore more efficient that text based protocols

 
Why It’s Not Ideal
For all of its benefits, there are some drawbacks:

  • The framework is poorly documented
  • It takes more work to get it running on the client side (though less work for the service owner if they are building libraries for clients)
  • It requires to generate the client side proxies
  • It’s binary and therefore often hard to debug

Google Protocol Buffers
This serialization format was developed for internal use and there have been protocol compilers developed for Java, C++, and Python made available to the public under an open source license. It is the only binary serialization you will need to use GRPC, although it does have an extra step to create programming language bindings. It is similar to Apache Thrift except that it does not include a concrete RPC protocol stack for defined services.
Otherwise the pros and cons are extremely similar, with Protobufs and Thrift operating at almost the
Exploring Your Other Options
There are other data serialization options apart from the above, such as, BSON (not nearly as compact as other binary formats), and XML messaged-based RPC (which is not very readable). These four options also require users to add a transport.
The Challenges in Serialization of Large Scale Data Cluster
Numerous challenges arise when attempting serialization and synchronization in a large scale data cluster environment.
The first challenge is achieving low latency. There will be delays when data is being recalled and synchronized, but the delays should not be noticeable to users.
Another problem is scalability. The server needs to be able to handle a large number of concurrent requests and connections, but it has to do so without adding too much overhead. But high level abstraction adds overhead.
This means a very fine balancing act must take place in order to keep systems running smoothly and efficiently.
There are also a number of properties required from the protocol (ten, to be exact):

  • Authentication
  • Encryption
  • Backward Compatibility (i.e. add new values to objects)
  • Schema
  • Easy Language Interoperability
  • Simplicity
  • Payload Agnostic (transfer the data in the format you choose. In RMI for example, you’re bound to Java)
  • Blocking and Non-Blocking I/O support
  • Cancellation and Timeout
  • Standardized Status Codes

 
Unfortunately, here is no generic solution which solves all of the above challenges, which is why most developers eventually end up developing their own proprietary RPC protocol (such as MongoDB and Cassandra).
Introducing  AsyncRMI
In light of the above, we created our own open source framework to this all too common problem which programmers face. We call it AsyncRMI.
What Is AsyncRMI and How It Addresses the Above Challenges
AsyncRMI uses Java RMI interfaces, dynamic code loading, and is an in-place replacement for Java RMI, making it incredibly user-friendly. Users can publish all objects using one server socket, meaning it takes up little resources and is firewall friendly, and its resource usage is also kept low through the use of NIO to serve request using a fixed amount of threads. Users can limit the number of open connections and the latency of the call round trip using a pipelining, allowing for high throughput, low latency, and a reduced use of resources. It also supports asynchronous invocation by using returning a standard Java CompletableFuture when making  async calls.
This framework  delivers other important benefits, such as:

  • Cancellation and Timeout
  • Blocking and Non-Blocking
  • Authentication
  • Encryption
  • Easy Language Interoperability
  • Standardized Status Codes (using Java exceptions)

Most importantly, through AsyncRMI, users can send many notifications regardless of the clients delays (i.e. even though some of the clients receiving notifications may be very slow, the server can still send out multiple notifications without any issue).

List<CompletableFuture<Void>> pendings = new List<>(listeners.size());

 for(Listener listener : listeners){

  // asynchronously notify.

  CompletableFuture<Void> pendingResult = listener.notify(event);

  // register future listenr to cancle the client listener

  // in case of notification failuer.

  pendingResult.exceptionally(throwable -> cancelListener(listener));

  // store the future to be processed by a

  // timer thread after the notify timeout expired.

  pendings.add(pendingResult);

}

Here is a complete walkthrough of this example .
At some other time from a timer thread when sufficient time has passed for the notify call to be sent to the client and back to the server call cancelPending(pendings);
privat void cancelPending(List<CompletableFuture<Void>> pendings){

 for(CompletableFuture<Void> pending : pendings){

   if(!pending.isDone()){

               pending.cancel();

   }

}

Because we are able to use the principles from RMI but in a performance-optimized manner, we are able to keep the process simple but without paying the performance cost.
In short, AsyncRMI is able to deliver the best of both worlds: it has the simplicity of Java RMI and the performance of Java NIO based framework. It looks just like Java RMI (unlike other proprietary protocols) and it can be easily replaced with an RMI-based solution. It doesn’t introduce a lock-in, and provides a variety of other features needed in production grade software.

CATEGORIES

  • Application Architecture
  • Application Performance
  • application scalability
  • Availability
  • data
  • Data Grid
  • Development
  • DevOps
  • distributed
  • distributed cache
  • distributed computing
  • GigaSpaces
  • gigaspacesxap
  • Grid
  • high
  • High Availability
  • IMC
  • Java
  • low latency
  • programming
  • real-time
  • Scalability
  • XAP
Barak Bar-Orion

All Posts (1)

YOU MAY ALSO LIKE

June 21, 2012

Trade and Event Processing at…
1 minutes read

September 24, 2008

Multi Core Scalability Benchmark –…
2 minutes read

October 15, 2010

NoCAP
8 minutes read
  • Copied to clipboard

PRODUCTS, SOLUTIONS & ROLES

  • Products
  • Smart Cache
  • Smart Operational Data Store
  • Smart Augmented Transactions
  • GigaSpaces Cloud
  • Roles
  • Architects
  • CIOs
  • Product Team
  • Solutions
  • Industry
    • Financial Services
    • Insurance
    • Retail and eCommerce
    • Telecommunications
    • Transportation
  • Technical
    • Operational BI
    • Mainframe & AS/400 Modernization
    • In Memory Data Grid
    • HTAP
    • Hybrid Cloud Data Fabric
    • Multi-Tiered Storage
    • Kubernetes Deployment
    • Streaming Analytics for Stateful Apps

RESOURCES

  • Resource Hub
  • Webinars
  • Demos
  • Case Studies
  • Solution Briefs & Whitepapers
  • Benchmarks
  • Cost Reduction Calculators
  • Analyst Reports
  • eBooks
  • Blogs
  • Documentation
  • Featured Case Studies
  • Mainframe Offload with Groupe PSA
  • Digital Transformation with Avanza Bank
  • High Peak Handling with PriceRunner
  • Optimizing Business Communications with Avaya

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Events
  • Careers
  • Contact Us
  • Book A Demo
  • Try GigaSpaces For Free
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • University
  • Services
  • Support
Copyright © GigaSpaces 2020 All rights reserved | Privacy Policy
LinkedInTwitterFacebookYouTube

Contact Us