(Big) data in public transport

Posted: 9 August 2015 | Dominik Grögler - BERNMOBIL | No comments yet

In the years to come, the demand of mobility will steadily increase. The growth of the population and the ease of access to various modes of transportation will lead to a highly mobile society. Dominik Grögler, Head of Engineering at BERNMOBIL, explores what data options are available for public transport operators to meet the future challenges of demand.

In 2010, 109,000 people commuted to the city of Bern in Switzerland every day – which was an increase of 16% since the year 2000. Over the same period, the number of people that commuted from Bern to other cities increased by 55% from 15,500 to 24,000 people¹. This accounts for a daily net growth of Bern’s population of 65%. Consequently, the number of transported passengers in the city of Bern has increased by 20% in the last 10 years from 84 million to 101 million².

If this trend continues and this increasing demand is to be satisfied, service availability and the performance of transport operators have to be enhanced. However, the finite amount of energy, of space for roads and tracks, the demands of private motorised transport and non-motorised traffic, the awareness of environmental consequences and limited financial resources, oppose these efforts.

Hence, transport operators will have to optimise their service by sharing their resources, adjust their planning processes and make good use of their own and external sources of (real-time) data. Not only is it out of reach for many transport operators to purchase and operate their own control centres and automatic vehicle location systems (AVL systems), but also, a centralisation would be an advantageous step towards an integral system designed to guide and control not only public transport but, rather, mobility in its entirety.

As more and more systems are interconnected, it is vital to standardise the planning processes and the timetable design. Only then can the transport services of bus, tramway and train operators be synchronised and passengers can expect to be guided along their journey and reach their destination with minimal delay and maximum of comfort.

In either case, data and its correct generation and interpretation are the cornerstones of all systems to work accurately. This is true for the design of timetables, for AVL systems, the computation of arrival and departure times and, most importantly, for the exchange of data between different operators. In order to make use of data in a profitable way, to optimise your own processes and to enhance the passenger experience, it is essential to understand your own data.

Definitions of big data

There is no universally accepted definition of big data. Often, a set of data is said to be big if conventional methods of storing and analysing fail, because either computers are too slow or algorithms are not suitable to process large amounts of unstructured data.

For example, 1,000 TB of raw data are produced in a single day at the European Organisation for Nuclear Research (CERN) in Geneva³. Facebook generates 130 TB of log files each day⁴. A single flight of a Boeing 787 Dreamliner accounts for 0.5 TB⁵. All the vehicles of BERNMOBIL – the local transport operator in the city of Bern, Switzerland – produce 20 TB of data – in 100 years⁶!

Based on these figures, clearly we are not dealing with big sets of data in public transport. However, advances to survey big amounts of data promoted new ways to analyse data, which led to an evolution of analytics from being descriptive, diagnostic, predictive, and finally prescriptive:

Descriptive: what happened? “Yesterday, connections from vehicles of bus line 1 to bus line 2 were interrupted”
Diagnostic: why did it happen? “Yesterday, because of an accident which caused delays on bus line 1, connections from vehicles of bus line 1 to bus line 2 were interrupted”
Predictive: what will happen? “Tomorrow, due to heavy rainstorms, bus line 1 will face large delays and connections between line 1 and line 2 will be interrupted”
Prescriptive: what should you do? “Tomorrow, instead of using lines 1 and 2, it is recommended to use lines 3 and 4 to reach your destination”

The degree to which the answers to the aforementioned questions are generated by machines in an automated process, steadily increases from 1 to 4 while the degree of human intervention decreases. An obvious precondition for automation is that the data has to be machine-readable. Favourably, the data is highly structured and the access barrier is low. These characteristics lead to another approach to classify sets of data which might be more constructive than classification by its sheer size: the differentiation of sets of data by identifying if they are generated in your own company or externally and if they are structured or unstructured. Examination of your own AVL data typically is the easiest and most important step, since they are readily available and provide a profound insight into some of the core tasks of a control system, e.g. the generation of real-time data for passenger information systems. The next step could be the integration of AVL data from other transport operators into the analysis. Finally, external sources like weather reports or the positioning data of mobile phone users could be taken into account. This would allow not only making better predictions of future traffic conditions, but also to analyse and optimise timetable design.

Open Data – Open Service

The public transport sector is subsidised to a large extent. Consequently, various private and governmental institutions demand transport data, including real-time data, to be available for the public. In the context of public transport, examples of sets of data that are referred to as open data are AVL data, GIS data (geographic information system data), references and names of bus, tram and rail stops or timetables. The characteristics of open data include: data should be raw⁷ and complete, machine-readable, based on open standards, highly available and free-of-charge. Unfortunately, many transport operators still hold on to their data out of fear, that it could be misinterpreted: a passenger that gets wrong routing information via his mobile application typically blames the transport operator and not the application programmer, even if the cause is a misinterpretation of the raw data by the application. Although there is some truth to this argument, it should not be used as a justification to restrain data. Even more, since big players like Google will get hold of the data anyway.

But there might be an elegant way out of this dilemma, which is the concept of open services (instead of open data): application programming interfaces (API’s) for mobile applications or web services, GIS-based routing information, information about disrupted service and about arrivals and departures of vehicles are all examples of open services. Instead of providing raw data, an API can be designed to answer specific questions: when is the next departure from my current location A to destination B? What is the fastest/cheapest/most direct/most comfortable route from A to B? What is the occupancy rate of the selected vehicle?

Transport operators could gain a lot if they concentrated on their core business: designing timetables and operating and controlling traffic. Only if the source data (e.g. GPS coordinates of stops, distances and travel times between stops, stop times) are of high quality, the potential of an AVL system can be fully exploited. With these preconditions, the generation of sets of data (e.g. estimates of arrival and departure times) of high quality can be looked at as a by-product of a modern AVL system. This data can be used by third party developers to do what they do best: designing, building and operating mobile applications, websites or other means of internet-based information services.

BERNMOBIL’s contribution

BERNMOBIL has put a lot of effort to the interconnection of various transport operators in Switzerland. The data hub 3.0 connects the federal railway system of Switzerland with nine local transport operators and two other data hubs – one of which is based in Germany. Six other transport operators will join the network in the near future. This allows providing passengers with real-time information regardless of where they are along their journey and with which operator they travel at a particular moment.

Furthermore, control centres are able to ensure that connections from one vehicle to another take place even if a feeder line is behind schedule by automatically instructing the second vehicle to wait for the arrival of the first one.

To guarantee the extensibility, ease of access and operation of the hub, it was developed consistently using the open VDV (Verband Deutscher Verkehrsunternehmen) standards, which were also the basis for the development of the European standard SIRI (Service Interface for Real Time Information).

So far, the data hub 3.0 is accessible only to transport operators and companies within the transport sector. In the not so far future, as open data and open government data initiatives will be put into effect, transport operators might be forced to make their real-time data available to the public. Until then, time should be used to design API’s and open services that meet the demand of passengers and third party developers of modern real-time passenger information systems.

Open services would allow transport operators to lower their costs to some extent while still having full control over their data. Third party developers could promote innovations by using open services and combining them with other sources of data. Ultimately, the national economy would profit and passengers would benefit by having improved service and better real-time information.

Conclusion

To meet the future challenges of growing demand and limited resources, transport operators have to further optimise their operational processes and share common systems and operating procedures with other companies. Understanding its own data and integrating data from external sources is most vital to stay competitive in an ever-more cross-linked world. It is not the sheer size of data, but rather its structure and its origin that determines if it is useful to accomplish these tasks. Open services are suited to meet the demands of third party developers and will lead to a better user experience while maintaining the transport operator’s control over their operational processes and their data.

References

Communique of the city of Bern: Strukturerhebung 2010 Pendlerströme. Statistics services of the city of Bern, http://www.bern.ch/mediencenter/aktuell_ptk_sta/wachsende-pendlerstroeme/downloads/strukturerhebung-2010-pendlerstrome.pdf/download
Company reports BERNMOBIL, http://www.bernmobil.ch/Seiten/Unternehmen/Geschaeftsberichte/?oid=1375&lang=de
http://home.web.cern.ch/about/computing
https://www.facebook.com/notes/facebook-engineering/scaling-facebook-to-500-million-users-and-beyond/409881258919
http://www.computerworlduk.com/news/infrastructure/3433595/boeing-787s-to-create-half-a-terabyte-of-data-per-flight-says-virgin-atlantic/
Author’s own calculation: each day 48 tramways and 140 buses are in operation at BERNMOBIL.
In this context raw data refers to the real-time data that is exchanged over the VDV interfaces and to the data that is generated by vehicles.

Biography

Dominik Grögler studied Experimental Physics at the University of Zurich and wrote his Master’s thesis on the detection of antimatter at the European Organisation for Nuclear Research (CERN) in Geneva. After working for a few years in IT and web design, Dominik wrote his PhD on voltage gated ion channels at the University of Zurich. Since 2010, he has been working as Head of Engineering at BERNMOBIL where he is responsible for the development and implementation of real-time passenger information systems.

Cookie	Description
cookielawinfo-checkbox-advertising-targeting	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Description
cf_ob_info	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
next-i18next	This cookie is served by the Swapcard app/website to detect and store the user’s language.
one_signal_sdk_db	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	This cookie is set by Youtube and is used to track the views of embedded videos.

Cookie	Description
bcookie	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	This cookie is set by LinkedIn and used for routing.
lissc	This cookie is set by LinkedIn share Buttons and ad tags.
nQ_cookieId	This cookie is served by the Swapcard event app/website and uniquely identifies the user’s session.
nQ_visitId	This cookie is served by the Swapcard app/website and uniquely identifies the user.
vuid	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Description
advanced_ads_browser_width	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Recommended

(Big) data in public transport

Definitions of big data

Open Data – Open Service

BERNMOBIL’s contribution

Conclusion

References

Biography

Leave a Reply Cancel reply

Recommended

(Big) data in public transport

Definitions of big data

Open Data – Open Service

BERNMOBIL’s contribution

Conclusion

References

Biography

Navigating success and challenges: A deep dive into Demand-Responsive Transport operations

SEPTA enhances safety with new enforcement programme

First Bus depot in York eliminates diesel refuelling tank

Stagecoach East calls for improved communication to tackle local traffic congestion

CTA reaches milestone with one million weekday rides in May 2024

Leave a Reply Cancel reply