Sunday, November 26, 2017

Spark for Azure HDInsight

As indicated in early Jul'17, Microsoft now officially leveraged Apache Spark in Azure HDInsight.  Early Ref:

Last week, Microsoft announced Azure Databricks service, new Cosmos DB features, enterprise AI capabilities and more at its annual Connect(); event in New York

Microsoft is getting the Apache Spark religion, introducing a new cloud service in preview, called Azure Databricks. This is noteworthy for a number of reasons.

First, the service was developed jointly by Microsoft and Databricks (the company whose founders are Spark's very creators), to deliver this Spark-based Big Data analytics service as a first-party Azure offering, and not a mere partner service on the Azure Marketplace.

Second, the service works independently of Databricks' own cloud service for Spark and of Azure HDInsight, Microsoft's own Big Data as a Service platform, on which Spark also runs.

Azure Databricks has nonetheless been designed form the ground up to take advantage of, and be fully optimized for, various Azure services, including blob storage, Data Lake Store, virtual networking, Azure Active Directory and Azure Container Service.

While Azure Databricks, like HDInsight, is still based on the creation a dedicated cluster, with the number and type of nodes (servers) being determined by the customer, it nonetheless has built-in auto-scaling and auto-termination, to grow the cluster as necessary and shut it down once it's no longer needed.

Sunday, November 19, 2017

ElasticSearch 6

Mid of last week, ElasticSearch 6 GA (General Availability) was released with tech upgrades like
  • migration assistant
  • resiliency
  • efficiency
  • scalability
  • security
  • index sorting.


Saturday, November 11, 2017

Kinesis Analytics

Kinesis Analytics now gives you the option to preprocess your data with AWS Lambda.  This gives you a great deal of flexibility in defining what data gets analyzed by your Kinesis Analytics application. You can also define how that data is structured before it is queried by your SQL.

It continuously reads data from your Kinesis stream or Kinesis Firehose delivery stream.  For each batch of records that it retrieves, the Lambda processor subsystem manages how each batch gets passed to your Lambda function.  Your function receives a list of records as input.  Within your function, you iterate through the list and apply your business logic to accomplish your preprocessing requirements (such as data transformation)

The input model to your preprocessing function varies slightly, depending on whether the data was received from a stream or delivery stream

Saturday, October 21, 2017

Artificial Intelligence Gluon

Last week, an interesting industry news that Microsoft and Amazon announced a surprise partnership to build AI platform (named Gluon) for an enterprise. Gluon makes it easier for developers to build AI/machine learning systems, and related Apps with open source concept.

I've an interesting dimension of this technology partnership to challenge Google's big area of AI dominance using Tensorflow.

Google TensorflowGoogle already has a head start with a tool called Tensorflow, which is free and open source and aimed at helping developers build machine learning apps. Tensorflow is immensely popular with developers.

In fact, it's the the fifth most popular project (by stars) on GitHub out of the over 2 million hosted on that site where open source projects are shared. Quick introduction video is shown at

Amazon MXNetNaturally, Amazon has a competitor to Tensorflow called MXNet.  Deep learning on AWS with MXNet, is shown at

Microsoft CNTKMicrosoft has a competitor tool for Tensorflow, called CNTK (Cognitive Tool Kit). Microsoft's open source deep-learning toolkit is shown at

Strategic Partnership
Machine learning and AI are the next big things in cloud computing, with the potential to cause significant changes to the cloud business that Amazon and Microsoft have long dominated.
Microsoft and Amazon have been known to cuddle up on other AI types of tech.

In August, the two announced they were partnering to make their two voice assistants work better together , Amazon Alexa and Microsoft Cortana.

Joint Venture GluonMicrosoft and Amazon have joined forces to help spread artificial intelligence across apps. They released a new tool for developers called Gluon as a free and open source project, meaning anyone can use it or work on it and contribute to it for free.

Gluon's role is to add a layer that makes MXNet and CNTK easier to use, work with and program. Only the MXNet version was released now; but the CNTX version of Gluon is promised to come soon. Short introduction is shown at

Ease for AI DevelopmentIn any case, the competition to create more AI tools for developers, and make them easier to use.  Demand of Artificial Intelligence in various industries, are reflected in 3 years scorecard

Saturday, September 30, 2017

Anywhere in an Hour

During Yesterday presentation at the International Astronautical Congress in Adelaide Australia, Elon Musk shared an inspirational speech on "Anywhere in earth in an hour"

Details are available at

Personally, I admire Elon's vision, commitment, building disruptive technology, enabling tech to the society usage and so on.

Sunday, September 24, 2017


Industry Graph share is led by Neo4j and Titan, the latter recently acquired by DataStax (DataStax Enterprise Graph).

Social graphs are a prime example of utilizing the graph model, Dr. Xu (PhD from UCSD) was working at Twitter till 2011, and the graph databases that were around at the time could not cope.

He has 26 patents in distributed systems & databases, led Teradata's big data initiatives, and worked on Twitter's distributed data infrastructure. So when faced with that problem, Xu saw an opportunity and went off to create a solution.

Xu founded GraphSQL in 2012 and has been working with a team of 30 engineers since.  Today GraphSQL is officially entering a new stage in its development, including a new name: TigerGraph.

The product is now generally available, a series A founding round of US$33 million is announced and a hosted version of TigerGraph based on Amazon EC2 is launching.

TigerGraph also supports different graph partitioning algorithms enabling it to split very large graphs over a distributed architecture. This can be done either automatically, or as specified by users using application-specific partitioning strategies.

There probably is a hefty price tag that goes with TigerGraph, but for the ones that can afford it, it looks like it can deliver some substantial benefits.

Thursday, August 24, 2017

C# 8.0

C# 8.0 has been previewed in Channel 9 by Mads Togersen.  Ref:

Top 5 tech highlights are:

1. Nullable Reference Types
Consider a scenario where you know that the nullable variable x isn’t actually null, but you can’t prove that to the compiler. In this case you can use x!.Method() to suppress the compiler warning about potential null reference exceptions.

2. Extension Everything
As with interfaces, you cannot define instance fields in extensions but you can simulate them using ConditionalWeakTable. You can also define static fields.

3. Default Interface Implementations
The primary benefit of default interface implementations is that you may be able to add new methods to an existing interface without breaking backwards compatibility.

4. Async Streams (a.k.a. foreach async)
This is referred to as a “pull model”. By contrast, IObservable is a “push model”, which means the producer can flood the consumer with a higher flow rate than it can handle.

5. Extension Interfaces
Extension interfaces, the ability to add new interfaces to existing classes, is also being considered.

My closing note is C# is ahead of tech capabilities and roadmap, on comparison with Java.