One year of E-Scooter data

Posted on Fri 05 July 2024 in escooter

Thumbnail

Dashboards

For more than one year, I collected the location of the E-Scooters in Switzerland. This is possible due to the provided API from the Swiss Federal Office of Energy (BFE). For approx. one year I could collect the data, after that, the ID of the scooters changed after each trip. With this modification to the API, I can no longer collect the taken trips of a specific E-Scooter. Because of that, the shown data in the dashboards ends on 2023-11-15.

I have now published 10 dashboards for the cities and regions with available scooter data.

Some key figures:

  • time range: 2022-10-16 until 203-11-15
  • collected every 2 minutes
  • datapoints: 2'800'000'000
  • required storage datapoints: 327 GB (MongoDB)
  • required storage processed data: 38.1 GB (PostGIS)

Processing of the data

Most of the work is done inside Apache Airflow. For this, I wrote two DAGs:

This is how the DAG for processing the data does look like:

Airflow Dag Graph

And here is the task duration of the same DAG:

Airflow Dag Graph

While processing a single day of data, every transformation happen inside a dedicated Docker Image, triggered by Airflow. This means, for every processed day, 149 Docker Containers get created. With this type of design, I can decoupble the Python dependencies between Airflow and the DAGs itself. You can find the custom written DockerOperator here. This custom DockerOperator does pass the Connections and Variables as ENV variables to the Docker Container.

Fortunately, many tasks can be run in parallel, which speeds up the execution significantly. This parallelisation can between mapped tasks, but also by tasks which are shown vertically in the image above.

Nodes and Edges for the Network Graph get imported from OpenStreetMap with the help of the Python package osmnx. The Network is needed to then calculate possible routes from one GPS location to the next of one single scooter.

See also