Home / How we do it

How we do it

We rely on Open Source software to do data wrangling, data analysis and visualizations.
  • Streamset Data Collector

    It is a light, powerful engine for the ingestion of data in real time. In order to define the data flow for Data Collector, a pipeline is configured. A pipeline consists in stages that represent the origin and the pipeline destination, and any additional processing that is necessary to realize.

    This is our primary tool when doing data engineering. We have been creating custom code for this tool.

    Here are some examples:

     

  • Elasticsearch

    Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

  • Apache Kafka

    Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.

    Like many publish-subscribe messaging systems, Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

  • Kibana

    It is a tool to facilitate the visual explotation of information stored in Elasticsearch. The information is arranged in dashboards and/or individual visualizations. Vizes are basically created based on ElasticSearch queries.

    We belive that using ElasticSearch and Kibaba to present data is fast and reliable. 

    Feel free to take a look at an integration example of ElasticSearch, Streamsets Data Collector and Kibana

  • Superset

    Superset is a data exploration platform designed to be visual, intuitive and interactive. Superset's main goal is to make it easy to slice, dice and visualize data. It empowers users to perform analytics at the speed of thought.

    Superset provides:

    • A quick way to intuitively visualize datasets by allowing users to create and share interactive dashboards
    • A rich set of visualizations to analyze your data, as well as a flexible way to extend the capabilities
    • An extensible, high granularity security model allowing intricate ruleson who can access which features, and integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuiler)
    • A simple semantic layer, allowing to control how data sources are displayed in the UI, by defining which fields should show up in which dropdown and which aggregation and function (metrics) are made available to the user
    • Deep integration with Druid allows for Superset to stay blazing fast while slicing and dicing large, realtime datasets
    • Fast loading dashboards with configurable caching
  • PHP

    PHP is a very popular open source language especially suitable for web development.

    As of today in 2017, popular companies such as Facebook(a modifed version tho-) and Slack are using it

    We have extensive experience in PHP using CMS and Frameworks such as Drupal and  Symfony. Check out some of our public contributions: