How we do it
Streamset Data Collector
It is a light, powerful engine for the ingestion of data in real time. In order to define the data flow for Data Collector, a pipeline is configured. A pipeline consists in stages that represent the origin and the pipeline destination, and any additional processing that is necessary to realize.
This is our primary tool when doing data engineering. We have been creating custom code for this tool.
Here are some examples:
- Visualize Apache Logs in Minecraft using SCD and Kafka
- Dockerizing SCD tutorials
- Custom Origin to pull data from Google Analytics
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.
Like many publish-subscribe messaging systems, Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.
It is a tool to facilitate the visual explotation of information stored in Elasticsearch. The information is arranged in dashboards and/or individual visualizations. Vizes are basically created based on ElasticSearch queries.
We belive that using ElasticSearch and Kibaba to present data is fast and reliable.
Feel free to take a look at an integration example of ElasticSearch, Streamsets Data Collector and Kibana
is a data exploration platform designed to be visual, intuitive and interactive. Superset's main goal is to make it easy to slice, dice and visualize data. It empowers users to perform .
- A quick way to intuitively visualize datasets by allowing users to create and share interactive dashboards
- A rich set of visualizations to analyze your data, as well as a flexible way to extend the capabilities
- An extensible, high granularity security model allowing intricate ruleson who can access which features, and integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuiler)
- A simple semantic layer, allowing to control how data sources are displayed in the UI, by defining which fields should show up in which dropdown and which aggregation and function (metrics) are made available to the user
- Deep integration with Druid allows for to stay blazing fast while slicing and dicing large, realtime datasets
- Fast loading dashboards with configurable caching
PHP is a very popular open source language especially suitable for web development.
As of today in 2017, popular companies such as Facebook(a modifed version tho-) and Slack are using it
We have extensive experience in PHP using CMS and Frameworks such as Drupal and Symfony. Check out some of our public contributions: