(Big) data in public transport

In the years to come, the demand of mobility will steadily increase. The growth of the population and the ease of access to various modes of transportation will lead to a highly mobile society. Dominik Grögler, Head of Engineering at BERNMOBIL, explores what data options are available for public transport operators to meet the future challenges of demand.

In 2010, 109,000 people commuted to the city of Bern in Switzerland every day – which was an increase of 16% since the year 2000. Over the same period, the number of people that commuted from Bern to other cities increased by 55% from 15,500 to 24,000 people1. This accounts for a daily net growth of Bern’s population of 65%. Consequently, the number of transported passengers in the city of Bern has increased by 20% in the last 10 years from 84 million to 101 million2.

If this trend continues and this increasing demand is to be satisfied, service availability and the performance of transport operators have to be enhanced. However, the finite amount of energy, of space for roads and tracks, the demands of private motorised transport and non-motorised traffic, the awareness of environmental consequences and limited financial resources, oppose these efforts.

Hence, transport operators will have to optimise their service by sharing their resources, adjust their planning processes and make good use of their own and external sources of (real-time) data. Not only is it out of reach for many transport operators to purchase and operate their own control centres and automatic vehicle location systems (AVL systems), but also, a centralisation would be an advantageous step towards an integral system designed to guide and control not only public transport but, rather, mobility in its entirety.

As more and more systems are interconnected, it is vital to standardise the planning processes and the timetable design. Only then can the transport services of bus, tramway and train operators be synchronised and passengers can expect to be guided along their journey and reach their destination with minimal delay and maximum of comfort.

In either case, data and its correct generation and interpretation are the cornerstones of all systems to work accurately. This is true for the design of timetables, for AVL systems, the computation of arrival and departure times and, most importantly, for the exchange of data between different operators. In order to make use of data in a profitable way, to optimise your own processes and to enhance the passenger experience, it is essential to understand your own data.

Definitions of big data

There is no universally accepted definition of big data. Often, a set of data is said to be big if conventional methods of storing and analysing fail, because either computers are too slow or algorithms are not suitable to process large amounts of unstructured data.

For example, 1,000 TB of raw data are produced in a single day at the European Organisation for Nuclear Research (CERN) in Geneva3 . Facebook generates 130 TB of log files each day4. A single flight of a Boeing 787 Dreamliner accounts for 0.5 TB5. All the vehicles of BERNMOBIL – the local transport operator in the city of Bern, Switzerland – produce 20 TB of data – in 100 years6!

Based on these figures, clearly we are not dealing with big sets of data in public transport. However, advances to survey big amounts of data promoted new ways to analyse data, which led to an evolution of analytics from being descriptive, diagnostic, predictive, and finally prescriptive:

  1. Descriptive: what happened? “Yesterday, connections from vehicles of bus line 1 to bus line 2 were interrupted”
  2. Diagnostic: why did it happen? “Yesterday, because of an accident which caused delays on bus line 1, connections from vehicles of bus line 1 to bus line 2 were interrupted”
  3. Predictive: what will happen? “Tomorrow, due to heavy rainstorms, bus line 1 will face large delays and connections between line 1 and line 2 will be interrupted”
  4. Prescriptive: what should you do? “Tomorrow, instead of using lines 1 and 2, it is recommended to use lines 3 and 4 to reach your destination”

The degree to which the answers to the aforementioned questions are generated by machines in an automated process, steadily increases from 1 to 4 while the degree of human intervention decreases. An obvious precondition for automation is that the data has to be machine-readable. Favourably, the data is highly structured and the access barrier is low. These characteristics lead to another approach to classify sets of data which might be more constructive than classification by its sheer size: the differentiation of sets of data by identifying if they are generated in your own company or externally and if they are structured or unstructured. Examination of your own AVL data typically is the easiest and most important step, since they are readily available and provide a profound insight into some of the core tasks of a control system, e.g. the generation of real-time data for passenger information systems. The next step could be the integration of AVL data from other transport operators into the analysis. Finally, external sources like weather reports or the positioning data of mobile phone users could be taken into account. This would allow not only making better predictions of future traffic conditions, but also to analyse and optimise timetable design.

Open Data – Open Service

The public transport sector is subsidised to a large extent. Consequently, various private and governmental institutions demand transport data, including real-time data, to be available for the public. In the context of public transport, examples of sets of data that are referred to as open data are AVL data, GIS data (geographic information system data), references and names of bus, tram and rail stops or timetables. The characteristics of open data include: data should be raw7 and complete, machine-readable, based on open standards, highly available and free-of-charge. Unfortunately, many transport operators still hold on to their data out of fear, that it could be misinterpreted: a passenger that gets wrong routing information via his mobile application typically blames the transport operator and not the application programmer, even if the cause is a misinterpretation of the raw data by the application. Although there is some truth to this argument, it should not be used as a justification to restrain data. Even more, since big players like Google will get hold of the data anyway.

But there might be an elegant way out of this dilemma, which is the concept of open services (instead of open data): application programming interfaces (API’s) for mobile applications or web services, GIS-based routing information, information about disrupted service and about arrivals and departures of vehicles are all examples of open services. Instead of providing raw data, an API can be designed to answer specific questions: when is the next departure from my current location A to destination B? What is the fastest/cheapest/most direct/most comfortable route from A to B? What is the occupancy rate of the selected vehicle?

Transport operators could gain a lot if they concentrated on their core business: designing timetables and operating and controlling traffic. Only if the source data (e.g. GPS coordinates of stops, distances and travel times between stops, stop times) are of high quality, the potential of an AVL system can be fully exploited. With these preconditions, the generation of sets of data (e.g. estimates of arrival and departure times) of high quality can be looked at as a by-product of a modern AVL system. This data can be used by third party developers to do what they do best: designing, building and operating mobile applications, websites or other means of internet-based information services.

BERNMOBIL’s contribution

BERNMOBIL has put a lot of effort to the interconnection of various transport operators in Switzerland. The data hub 3.0 connects the federal railway system of Switzerland with nine local transport operators and two other data hubs – one of which is based in Germany. Six other transport operators will join the network in the near future. This allows providing passengers with real-time information regardless of where they are along their journey and with which operator they travel at a particular moment.

Furthermore, control centres are able to ensure that connections from one vehicle to another take place even if a feeder line is behind schedule by automatically instructing the second vehicle to wait for the arrival of the first one.

To guarantee the extensibility, ease of access and operation of the hub, it was developed consistently using the open VDV (Verband Deutscher Verkehrsunternehmen) standards, which were also the basis for the development of the European standard SIRI (Service Interface for Real Time Information).

So far, the data hub 3.0 is accessible only to transport operators and companies within the transport sector. In the not so far future, as open data and open government data initiatives will be put into effect, transport operators might be forced to make their real-time data available to the public. Until then, time should be used to design API’s and open services that meet the demand of passengers and third party developers of modern real-time passenger information systems.

Open services would allow transport operators to lower their costs to some extent while still having full control over their data. Third party developers could promote innovations by using open services and combining them with other sources of data. Ultimately, the national economy would profit and passengers would benefit by having improved service and better real-time information.


To meet the future challenges of growing demand and limited resources, transport operators have to further optimise their operational processes and share common systems and operating procedures with other companies. Understanding its own data and integrating data from external sources is most vital to stay competitive in an ever-more cross-linked world. It is not the sheer size of data, but rather its structure and its origin that determines if it is useful to accomplish these tasks. Open services are suited to meet the demands of third party developers and will lead to a better user experience while maintaining the transport operator’s control over their operational processes and their data.


  1. Communique of the city of Bern: Strukturerhebung 2010 Pendlerströme. Statistics services of the city of Bern,
  2. Company reports BERNMOBIL,
  6. Author’s own calculation: each day 48 tramways and 140 buses are in operation at BERNMOBIL.
  7. In this context raw data refers to the real-time data that is exchanged over the VDV interfaces and to the data that is generated by vehicles.


SwitzerlandDominik Grögler studied Experimental Physics at the University of Zurich and wrote his Master’s thesis on the detection of antimatter at the European Organisation for Nuclear Research (CERN) in Geneva. After working for a few years in IT and web design, Dominik wrote his PhD on voltage gated ion channels at the University of Zurich. Since 2010, he has been working as Head of Engineering at BERNMOBIL where he is responsible for the development and implementation of real-time passenger information systems.

Related topics

Related cities

Related organisations

Related people