One year of E-Scooter data
Posted on 2024-07-05 in escooter
Dashboards
- Zurich, Switzerland
- Basel, Switzerland
- Bern, Switzerland
- Winterthur, Switzerland
- St.Gallen, Switzerland
- Biel/Bienne, Switzerland
- Uster, Switzerland
- Frauenfeld, Switzerland
- Effretikon, Switzerland
- Grenchen, Switzerland
For more than one year, I collected the location of the E-Scooters in Switzerland. This is possible due to the provided API from the Swiss Federal Office of Energy (BFE). For approx. one year I could collect the data, after that, the ID of the scooters changed after each trip. With this modification to the API, I can no longer collect the taken trips of a specific E-Scooter. Because of that, the shown data in the dashboards ends on 2023-11-15
.
I have now published 10 dashboards for the cities and regions with available scooter data.
Some key figures:
- time range:
2022-10-16
until2023-11-15
- collected every 2 minutes
- datapoints: 2'800'000'000
- required storage datapoints: 327 GB (MongoDB)
- required storage processed data: 38.1 GB (PostGIS)
Processing of the data
Most of the work is done inside Apache Airflow. For this, I wrote two DAGs:
- DAG to collect the data from the API from BFE
- DAG to process the collected data on a daily basis
This is how the DAG for processing the data does look like:
And here is the task duration of the same DAG:
While processing a single day of data, every transformation happen inside a dedicated Docker Image, triggered by Airflow.
This means, for every processed day, 149 Docker Containers get created. With this type of design, I can decoupble the Python dependencies between Airflow and the DAGs itself.
You can find the custom written DockerOperator
here. This custom DockerOperator
does pass the Connections and Variables as ENV variables to the Docker Container.
Fortunately, many tasks can be run in parallel, which speeds up the execution significantly. This parallelisation can between mapped tasks, but also by tasks which are shown vertically in the image above.
Nodes and Edges for the Network Graph get imported from OpenStreetMap with the help of the Python package osmnx. The Network is needed to then calculate possible routes from one GPS location to the next of one single scooter.
See also