Taxi Data Analysis

DATASET

practice

The Data is collected from the trajectories for all the 442 taxis running in the city of Porto, in Portugal. These taxis operate through a taxi dispatch central, using mobile data terminals intalled in the vehicles. Such data is categorized into three: A) taxi central based, B) stand-based or C) non-taxi central based. For the first, it provides an anonymized id, when such information is available from the telephone call. The last two categories refer to services that were demanded directly to the taxi drivers on a B) taxi stand or on a C) random street.

Each data sample corresponds to one completed trip. It contains a toltal of 9 features, described as follows:

1. TRIP_ID: (String) contains an unique identifier for each trip
2. CALL_TYPE: (char) identifies the way used to demand this service.
3. ORIGIN_CALL: (integer) contains an unique identifier for each phone number which was used to demand, at least, one service.
4. ORIGIN_STAND: (integer) contains an unique identifier for the taxi stand.
5. TAXI_ID: (integer) contains an unique identifier for the taxi driver that performed each trip.
6. TIMESTAMP: (integer) Unix Timestamp in seconds
7. DAYTYPE: (char)identifies the daytype of the trip's start.
8. MISSING_DATA: (Boolean) is FALSE when the GPS data stream is complete and TRUE whenever one or more locations are missing
9. POLYLINE: (String) contains a list of GPS coordinates mapped as a string. The beginning and the end of the string are identified with brackets.