– Data Visualization


Hi everyone. Today is the last time I’ve worked at feels. I decided to spend whole weekend to do experiment which I haven’t done before.

I remember to, popular opensource published by Uber. Why don’t we get a try ? 😉

I tried installing, MapBox, implement some code, take a deep understand in GLSL. Tried to visualize VisaH1B static, Flight record, Taxi route. All datasets are available on Kaggle which under CC license.

And now, it’s time to write blog for sharing my experiment.

Hope you enjoy it :]

1. Data Visualization

Data Visualization is quite fun. Perhaps when you see the term “Data Visualization”, you might remember to ugly Microsoft Excel’s spreadsheet chart. In real world, there are many situations we need to visualize the data.

  • If you’re student, before going to big final semester presentation, you would like to be stand out. What would you do?
  • If you’re founder, I believe you had one time that shows off your growth user over the world to potential investors, and you want to make impression on the first sight. What do you do?

I believe we’re using the same chart like below. It’s so common, trivial.

There are millions of people did that. If you would like to be stand out. Please don’t do the same way with hundred people did before. 😎

Brave world

As you can see, data visualization is really catch-eye.


  • It can be accessed quickly by a wider audience.
  • It conveys a lot of information in a small space.
  • It makes your report more visually appealing.


  • You must put more effort.

It’s boring if we keep doing every time. I would be better if we put myself into a dangerous zone, try somethings new, push to boundary.

2. Roadmap

This experiment will show you how to visualize million of records in world-map.

You will understand

  • How to use vector-based Mapbox SDK, and able to custom map with own style.
  • How to use, a WebGL-powerful framework for visual exploratory data analysis of large datasets.
  • Visualize the location of 1 million trees in New York city with ScreenGridLayer
  • Experiment on Heatmap
  • Visualize the flight record data by using FlightLayer
  • Combination: GLSL shaders and to achieve dynamic-map.

Here are screenshots.

683.788 trees in NY by ScreenGridLayer, run smoothly at 60FPS

The square, brighter green colors, means more trees round up.

Heat-map for better look

Flight record data with Flightlayer (Static)

Dynamic visualization with GLSL shader

3. Technologies

3.1 React-redux

Before writing actually code. It’s better if I cover and mention all technologies we’re using.

[IMPORTANT] I assume you’ve already had experience on React and Redux. If not, don’t be shy to follow tutorials on google.

3.2 MapBox

Mapbox is a large provider of custom online maps for websites such as Foursquare, Pinterest, Evernote, the Financial Times, The Weather Channel and Uber Technologies.

Mapbox offers a million way to custom your map. Fit with many contextures: Drones, Finance Government, Logistics Media Natural resources, Outdoors, Real estate, Security, Transportation, Travel, … Each kind of maps have difference look

Mapbox also provides amazing studio, then you can create owner map as well as.

Instead of using mapbox.js officially, I intend to use react-map-gl. It’s React friendly API wrapper around MapboxGL JS. It’s one of popular library was published by Uber recently with 2.2k ⭐️.


Beside react-map-gl. Uber also publish tool.

There are 3 highlight factors you might consider.

  1. Organize the complex data by following Layered Approach. Makes it easy to package and share new visualizations as reusable layers.
  2. High-Precision computations in the GPU: By emulating 64-bit floating point computations in the GPU. renders datasets with unparalleled accuracy and performance.
  3. React and Mapbox GL Integrations: Great match with React, supporting efficient WebGL rendering under the Reactive programming paradigm. deserves more than 1.8k ⭐️ on Github 💯

3.4 Kaggle

It’s incompliance if we don’t have much data for visualization. Kaggle is the best place to get huge data. They offer open datasets on everything from government, health, and science to popular games and dating trends. It’s really valuable treasure for data mining or training model for deep learning machine 🤘

As part of this blog. I used 2015 Tree Census in NewYork city and   2015 Flight Delay and Cancellation. All of them are released under CC0: Public Domain License. So you can read/write, modify, distribute the data whatever you like. It’s cool 👍

3.5 GLSL Shader

I also cover a bit about GLSL Shader. If you’re learned OpenGL, you must be familiar with this term. I will keep it simple as possible because I don’t have time to explain deeper.

Basically, GLSL has a syntax similar to C, which is executed directly by the graphics pipeline. There are two types — Vertex Shaders transforms shape positions to real 3D drawing coordinates. And Fragment Shaders helps to render colors and other attributes.

By using GLSL, we can achieve high performance with millions of data on map 😎

4. Time for coding

4.1 Stater kit

For saving time, I won’t show you how to install Node, or react-map-gl. It’s waste of time. If you’re patient, feels free to start new project by yourself. I recommend cloning react-redux-starter-kit project. It includes React, Redux, React-router too as basic starter kit.

To begin, please download starter pack which I prepared. It contains base React-Redux, LayerInfo, MapSelection, react-map-gl, tween,, as well as data from Kaggle. … all you need are ready 🤘

After clone starter-pack. Please make sure you run

It opens localhost automatically, and notice the control at left-bottom. It’s all we do now 👍

Before going through the tutorial, we should take a look at project structure in detail.

We have project structure here.

  • ./info: LayerInfo layer
  • ./modules: Include action.js and reducer.js of redux
  • overlays: flight_overlay.js, tree_heatmap_overlay, tree_screengrid_layer, taxi_overlay.js. All of them are layer which we implement.
  • main.js: The logic of this example. We will fill render() func to show map and overlay layer off. 👍
  • map-selection: MapSelection control.


Define constants, such as MapBoxAccessToken, data source name.


INITIAL_STATE defines the main variables. It’s self-explanatory. When dealing with map, we usually encounter bunch of map context, such as Latitude, longitude, zoom, pitch, bearing. If you don’t know one of them. Please do short search on google. 😉

In addition, flightArcs, airports, tress, text and mapMode are main variables which we’re storing data from CSV or JSON later.


Navigate to bottom of main.js, you will see the render function.

render() is simply render the map as well as MapSelection and LayerInfo.

To be implemented

Here are 4 files we need to write real code. Please open each file, and look at temporary functions which I prepared for you.

Don’t worry about this too much right now. I’ve covered all in next chapter 👍

4.2 Data Source’s structure

Trees in New York

You can download tree_new_york.csv source from Kaggle.

The size is around 80Mb, contains ~700.000 records. The structure of tree_new_york.csv is really simple. But today, we should focus on latitude and longitude

Flight Delay and Cancellation

Download flight_record.csv and airport.csv from Kaggle too. It’s largest file, around 192Mb zipped, 600Mb after extraction.

In flight_record.csv, they didn’t include lat/long. So we need to write helper code to map departure/arrival airport’s coordination.

The valuable is departure_delay, distance, air_time field. It brings to us an important point. Useful for calculating the color of flight route (depend on how long departure_delay), velocity, and airplane position in the sky real-time (we do by GLSL later).

Maybe take a second to mull that over. By understanding the data structure, we could process and manipulate the data easier.

4.3 Mapbox

The first goal we should achieve is figured out the way to present the map. According to my mention before, we use MapBox through this tutorial. Please navigate to _renderMap() in main.js, and implement below code

It’s really straight-forward, we pass width, height, mapStyle, and mapViewState into <MapboxGLMap>. You can get further info at mapbox’s documentations. The important point is isActiveOverlay. It will render the VisualizationOverlay if we choice any mode from MapSelection.

I wrote it for you. It automatic switches and render individually overlay depend on mapMode we’re selecting.

_handleViewportChanged will be triggered whenever the viewport changed, and update the viewState as well as rendering the overlay again. If we don’t re-render overlay layer with new state of map, the layers are still same old state. It’s incorrect behavior.

Mapbox Public Token

Don’t forget to get your Mapbox Public Token from Mapbox’s Dashboard

Feels free to spent 5 mins for experiment Mapbox’s map editor. It’s amazing tool, allow you to edit/modify the layout of map depends on your purpose.

4.4 Visualize trees in New York

Parsing tree_new_york.csv

_loadCsvFile is helper func I wrote for you. It reads csv file and parsing into array. Then dispatching loadTrees action to ReduxStore
The purpose of our tutorial is representing the distribution of tree. So we ignore unnecessary fields, we only care long/lat. It’s why I .map() to transform items into position object. Then returning the result into mainState. It’s not hard If you’re familiar with React+Redux.


ScreenGridLayer is built-in It’s similar with Heatmap. The ScreenGridLayer takes in an array of latitude and longitude coordinate points, aggregates them into histogram bins and renders as a grid.

Please navigate to tree_screengrid_overlay.js in overlay folder.

It’s perfect time to implement _renderTreesOverlay() func. Initialize DeckGL with specific width/height. The layer’s parameter is passed from _renderTreeLayer(). Be care here. We return ScreenGridLayer associate with trees model.


Run on terminal.

The result works like a charm. Despite 683.788 trees, still handles perfectly, around 40-60 FPS.

The aggregation is done in screen space, so the data prop needs to be reaggregated by the layer whenever the map is zoomed or panned. This means that this layer is best used with small data set, however, the visuals when used with the right data set can be quite effective.

The frame will be dropped gradually when zooming in/out, but it’s acceptable result 💯

ScreenGirdLayer is perfect choose if you would like to visualize the distribution of user/object on a map. 🤘

4.5 Heat-map

Heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors.

It’s extremely useful to present the large data over the world. There are many different color schemes that can be used to illustrate the heatmap, with perceptual advantages and disadvantages for each.

It’s time to bring heatmap. Come back at _renderTreesHeatMapOverlay in tree_heatmap_overlay.js and put it down.

lngLatAccessor : Data accessors can be provided if your data doesn’t fit the expected form{longitude, latitude}.

I admit react-map-gl-heat overlay handled performance for 700.00 is worse.

React-map-gl-heat also provides gradientColors, you can customize your look easier. ColorBrewer offer GUI to pick color-scheme.

 4.6 Conclusion

In section 4, I mention two of layers which built-in They also have Arc Layer, Choropleth Layer, Line Layer, Scatterplot Layer. Trying all of layers is better idea 👻

5. FlightLayer

5.1 Overall

It’s time for the exciting part. We’re going to extend’s Layer, customize it for representing Flight Route. Because of don’t offer FlightLayer, so the one way is to implement it by our hand.

We’re dealing with GLSL shader, please confirmed you’re installed glslify correctly. If now, please run below code.

5.2 Extend’s Layer.

FlightLayer is a curve from sourcePoint to tartgetPoint. As you can see the screenshot below, it’s actually parabola we’ve learn in Secondary school right 😉

Look closer. I admit it doesn’t look like flight route. Because the flight routes are extremely complexity in real-life. We assume it’s simply parabola.

So the FlightLayer takes 3 parameters: sourcePosition, targetPosition as well as a color.

5.3 Processing flight record data

Before visualizing, we need to process the flight data to fit with our requirement.

Load flight-data.csv, airport.csv

in reducer

Please take a notice at LOAD_AIRPORT action, I convert airport array to hash-map. Because time complexity for searching key in generally is O(1). It’s really useful for LOAD_FLIGHT_POINT action, when mapping coordinate for original/destination airport.

5.4 Creating custom layer: Flight layer

Time for hacking your brain 😉

Layer life-cycle

Below is graph I summarize layer’s lifecycle.

Fully documentation here . Because of we extend from Layer, so we only override Layer.initializeState(), Layer.draw({uniform}) and Layer.updateState() (optional).

  • Layer.initializeState(): This method is called only once for each layer (as defined by the id property), to set up the initial state for that layer. will already have created the state object at this time, and added the gl context and the attributeManager context.
  • Layer.draw({uniform}): Allows a layer to “render” or generate one or more Layers passing in its own state as props. The layers will be rendered after the rendering layer, but before the next layer in the list. renderLayers will be called on the new layers, allowing the decomposition of the drawing of a complex data set into “primitive” layers.
  • Layer.updateState(): Called when a new layer has been matched with a layer from the previous render cycle (resulting in new props being passed to that layer), or when context has changed and layers are about to be drawn.

Implement FlightLayer

Please navigate to ../scr/layers/core/flight-data/flight-data.js. I prepared all for saving time 🤗

in flight-data.js

All we need are here.

Create gl model

NUM_SEGMENTS is a number of small segment we draw parabola. The reason is, In OpenGL world, we can’t draw arc line, we need to draw each segment and line it together. As more as NUM_SEGMENTS is, the flight route is more smoothly.

Finally, we create Model and Geometry internally, GL.LINE_STIP and position as input agreement.

What’s it? It’s placeholder array for representing index of segment. We passed <i> 3 times because we store vec3. It’s using in vertex.glsl to determine which index of segments 👍

Don’t forget to take a look at this.getShader(). I wrote for you. It’s simply to read shader files in same directory.


initializeState() is one method you must implement to create OpenGL resource you need to render layer. looks for the variable model on your state, and if set expects it to be an instance of a [] Model class.

So we passed model we created before to this.state. According to’s attributeManager recommendation, we should define the attribute for vertex.glsl file.

We defined

  • instanceSourcePositions, instanceSourcePositions is [vector3] as x,y,z.
  • instanceSourceColors, instanceTargetColors is [vector4] as r,g,b,a and unsigned_byte.

The complete code is below.

Calculate position vec3

It’s time for implementing helper function to calculate source/target position and color as well.

It’s not easy to understand if you’re not familiar WebGL or OpenGL in general. The main idea is to create array for each attribute (in vertex shader file)

We get source position. get longitude and latitude which stand for sourcePosition[0] and sourcePosition[1], then passing to value array. Please z = 0, because it’s on earth ground.

It’s hacky solution because we can’t pass our object from javascript directly into vertex.glsl. The common approach is using array to pass long/lat indirectly and processing those data later. Actually, we could pass struct, but at this time, I won’t mention it.

Calculate colors vec4

We do same philosophy with color as well.


Draw layer and passing our data uniform into shader file is self-explanatory. We also pass trailLength, currentTime, timestamp as uniform too. We will use it later, for animation 🤗

5.5 GLSL (OpenGL Shading Language)

If you don’t have basic knowledge about GLSL, feels free to skip this section.

GLSLis a high-level shading language with a syntax based on the C programming language. Give developers more direct control of the graphics pipeline without having to use ARB assembly language or hardware-specific languages.


  1. preprojectAll positions must be passed through the preproject function (available both in JavaScript and GLSL) to convert non-linear web-mercator coordinates to linear Mercator “world” or “pixel” coordinates, that can be passed to the projection matrix. At this time, we convert long/lat coordinate to pixel in our webview.
  2. smoothstep is built-in method in OpenGL. perform Hermite interpolation between two values. The segmentRatio depends on index of segment. So it’s perfect time to get value from positions we created before.
  3. We implement paraboloid to calculate the height of parabola.

Illustration each segment in parabola curve

    4. Calculate the shift depend on currentTime, then passing to gl_Position.

5. vTime is varying variable will pass to fragment file. We calculate the time has been passed.

glLineWidth problem

Do you notice the width of flightLayer is too slim? I tried to put gl.widthLine(100) in draw() func. It didn’t work unfortunately. It takes me few hours to understand why gl.lineWidth didn’t work. Here is problem.

We have backup plan to implement the width line, by drawing a rectangle with height is width line. I won’t cover it.


After finished vertex.glsh, we should move to fragment. The fragment is essential file, the main purpose is handling all color mapping for each vertex.

Please notice, vTime is passed from vertex.glsl. At this time, we will turn it by mutipling with color’s alpha.

Look cool right? 😍

Tween timer

If you try running example at this time, you only see the fully parabola over the map. Now we’re adding timer to achieve animation. Navigate to startTweenTimer() in main.js


It triggers tweenTimer automatically whenever we select new mapMode. I also have if condition to prevent duplicated fired.

One more thing

Notify attributeManager in flight-layer.js to invalidate all vertex, and re-render with new currentTime attribute.

5.6 Finally

Time for going back to terminal and type holy spell.

Feels free to switch between each map mode. I belive you will be impressed what we’ve done 💯

7. Source code

I published all source in my github account. Feels free to clone and create contribution if you have any great idea 😉

Data Visualization – Github

7. What things are we forgetting?

If you have bright eyes, you might notice that all flights were flew and land at destination airport at the same time, whatever the distance of airport is far away or near. It looks unreal, right? 🤔

Yeah, I did it on my purpose, it’s time for your experiment, your exercise 🤗

8. Where’s go from now?

If you like and have impression how we visualize the huge data into the map. Don’t be hostile to take a look at Pick your favorite data at Kaggle.

Learning and understanding GLSL is really useful if you want to take directly to GPU.

Hope you have greate time to ignite your lazy brain  👻

6 thoughts on “ – Data Visualization

  1. Hello, thanks for the article!

    I have a doubt about this line:
    float _timestamp = (positions.x * 15.0) / 2000.0;
    could you explain where did you get the values 15.0 and 2000.0 ?

  2. I see your page needs some fresh & unique articles.
    Writing manually is time consuming, but there is solution for this hard task.

    Just search for – Miftolo’s tools rewriter

  3. This tutorial is broken. I spent a few hours on it, but the tree and flight examples both do not work. They for loading data, but that’s pretty much it. I did learn the basics of these library, so thank you for the tutorial, although it’d be great if you could update the tutorial and make sure it works… 🙁

  4. Hi Nghia,

    Thanks for your helpful article.

    I have a doubt about building your example code. It was be stuck at compile viewport.js file. Could you kindly help me that case?


  5. I see you don’t monetize, don’t waste your traffic, you can earn additional cash every month
    with new monetization method. This is the best adsense alternative for
    any type of website (they approve all websites), for more details simply
    search in gooogle: murgrabia’s tools

Leave a Reply to Cancel reply

Your email address will not be published. Required fields are marked *