IDENTIFICATION OF MULTIFUNCTIONAL URBAN ACTIVITY CENTERS IN TOKYO

. Identification of urban activity centers is among the most important components of the urban structure study, it is necessary for reasonable planning, regulation of traffic flows and other practical measures. The purpose of this paper is to design a complex method to identify urban activity centers based on different but universal data types. In this study, we used social media data (Twitter) since it guarantees regular updates and does not rely on administrative borders and points of interest database that was considered a ‘hard’ representation of multifunctional urban activities. A large amount of geotagged tweets was processed by means of statistical modelling (spatial autoregression) and combined with the distribution analysis of points of interest. This allowed to identify the local centers of urban activity within 23 special wards of Tokyo more objectively and precisely than when only based on the social media data. Thereafter, delimitated centers were classified in order to define and describe their main functional and spatial characteristics. As a result of the study, railway transport was identified as the main attraction factor of the urban activity; the modern urban structure of Tokyo was identified and mapped; a new comprehensive method for identification of urban activity centers was developed and five classes of urban activity centers were defined and described. Researcher, Lomonosov MSU) for advice on the spatial modelling, Prof. Toshio Omata (Former Professor, Toyo University) for assistance during fieldwork; Elena Pozdorovkina (Former UNFPA Senior Advisor) for helping with editing; and Ruslan Dokhov (Habidatum, RxD Lead) for inspiring ideas at the initial stages of the research process. Conflict of interests: The authors reported no potential conflict of interest. as administrative boundaries of the city.


INTRODUCTION
Identification of urban structure is one of the most important tasks of Urban Geography. One of its key components is the identification of urban centers, or urban activity centers (UACs), which is necessary for urban planning, regulation of traffic flows, and other practical matters. The majority of studies aimed at identifying urban structure analyze census data, economic statistics, or employment patterns. Our work is devoted to the development of a methodology based on the use of social media data, location of points of interest (POI) and their characteristics. In this research we use the term 'urban activity centers' to refer to the areas of the city with the concentration of urban activity and points producing it, to a higher measure, in comparison to their surroundings.
This study focuses on Tokyo 1 , one of the largest cities in the world by the population of the metropolitan area (Demographia 2017). Having developed over the course of several centuries, Tokyo formed a complicated sociospatial structure: it is a polycentric urban system which consists of several urban cores interconnected by energy, human, traffic and information flows between them.
The main objective of this work is to identify and analyze the key elements of this structure -urban activity centers. This paper summarizes the stages of original research in the following sequence: development of the comprehensive methodology for identification of urban activity centers; application of the methodology to the case of Tokyo; classification of the identified centers and analysis of their qualitative and quantitative characteristics as well as their role in urban life and urban structure.

Models and theories for identification of urban activity centers
The main aim of this study is to identify the urban structure of Tokyo by delimitation of the urban activity centers within the city boundaries. First of all we need to define what the urban activity centers are and what criteria should we use for their identification. In the second half of the 20 th century these questions were on the research agenda of scholars and relevant publications. In general they interpreted urban activity centers as areas or neighborhoods characterized by an outstanding concentration of employment (e.g. number of workplaces). This kind of perception of the centers was introduced by M. Fujita and H. Ogawa in the early 1980s (Fujita, Ogawa 1982), but it has largely transformed in connection with the development of the concept of post-industrial cities. Scientists following these trends, such as B. De Goei (De Goei et al. 2010), M. Batty (Batty 2013), S.B. Pomorov and R.S. Zhukovsky (Pomorov, Zhukovsky 2015) argue that the true cores of cities are the centers of voluntary visits. In such places the facilities related to commerce, culture and recreation play the role of attraction points. They form permanent stable human flows that are even larger than in employment centers (Kotov et al. 2016).
Selection of criteria or indicators for UAC identification is certainly an important step for determining the spatial structure of a city but it is not the only one. In fact, if we identify the cores of urban activity through the actual distribution of any selected indicator in cities with a pronounced center peripheral subcenters will appear insignificant in the urban landscape. For this reason we have come up with a conceptually different approach that allows to take into account not only the size of UACs but also their distinguishing features within the local urban subsystem.
One of the most popular and sophisticated methods based on this idea in Russian Urban Studies is the «nonuniformly-zoned model of the city» (NZM) developed by A. A. Vysokovsky (Vysokovsky 2005). Basically it allows to identify commercial activity centers and the territory of their influence at different hierarchical levels. This method involves the analysis of areal units -cells of a regular or expert-selected grid instead of the pure distribution of point phenomena. The first step is to determine the general distribution of «centrality» (overall trend) in the urban space, the second is to identify the areas with the largest positive differences between the real concentration (or value) of the chosen indicator and the overall trend. The territorial units with abnormally high values are the areas with the greatest potential for attraction. They can be called local activity centers. This computational algorithm makes it possible to calculate the absolute values of the chosen indicator by the territorial units and also take into account the local importance of the sites and zones that generate commercial activity. This approach reveals the influence of the areal zones at both local and citywide levels.
The methods proposed by McMillen and McDonald consider cities as polycentric objects as well, however the centers are understood differently. To begin with, they clearly distinguish the main center, i.e. the overall core of the urban area (usually the central business district), and subcenters located around it as the main components of the urban socio-spatial structure. Subcenters represent parts of the territory with a relatively higher density of urban activity in any form (McMillen 2004) and in this sense the spatial distribution of urban activity centers is analogous to what Vysokovsky wrote about.
In his works McMillen focused on the issue of how existing employment data should eventually be processed. He developed a sophisticated method that significantly improves the objectivity urban cores identification based on the census data or economic statistics. He achieved notable progress in this area by applying non-parametric methods, such as geographically weighted regression, and semi-parametric employment density functions to identify subcenters as areas with activity density considerably higher than expected, based on the distance of these subcenters from the central business district. Logistic regression is not applicable for this task due to the asymmetric distribution of subcenters (McMillen 2001). The disadvantage of this method is that the central business district or the central point of the city should be defined at the initial stage. The crucial assumption is that the closer a certain segment is to this central point of the city, the more concentrated is its human activity. This method was widely used in the later works of McMillen, including his joint work with McDonald, (McMillen, McDonald 1997), as well as by other researchers, for example, F. Riguelle (2007).
Our definition of urban activity centers and the proposed methodology for their identification is based on the above mentioned studies of Vysokovsky and McMillen. We rely upon the Vysokovsky concept of the overall (or global) trend of centrality. Since our objective is to identify not only large concentrations of activities in the city center but also UACs that are significant at the neighborhood or ward level, we have to detect this overall trend in the urban system and eliminate it in order to identify local, smaller-scale spatial patterns of activity distribution. While Vysokovsky and his followers used moving average method, which is not very effective, for the same purpose, McMillen initiated the application of more complicated non-parametric methods. As the reader will see from the section 'Identification of urban activity centers' authors of this paper employed spatial autoregression to eliminate the issue of overall centrality trend being inspired by McMillen's findings.

Application of social media data for identifying urban activity centers
Increased availability of the Internet and, particularly, the growing use of social networks provides new opportunities for studying the spatial organization of people and interaction between people and space. The existence of social media data with spatial references opens up access to huge amounts of information describing the spatial behavior of people, their mobility and attachment to certain places. Previously social media data appeared to be actively used for political science research and marketing. Urban planners and researchers of urban space became interested in it very recently. Many scholars (Campagna 2014;Campagna et al. 2015;Evans-Cowley, Griffin 2011) have emphasized the necessity of this data for a quick response and better awareness.
The value of social media data for spatial patterns studies lies in geo-tagging, i.e. the possibility for correlating notes, photographs and web pages to geographic coordinates. Thus, researchers have an opportunity to obtain information concerning demographic, thematic, behavioral and contextual features that are temporally and spatially referenced. The main methodological problem related to the use of social media data is to generalize this heterogeneous information. Existing data analysis methods for geotagged social media datasets are developing at an impressively rapid pace. For instance, Ciuccarelli, Lupi and Simeone (2014) proposed a design of a social media data processing tool that would be user-friendly for urban planners without a technical background. Other researchers (Bingham-Hall and Tidey 2016) focused on the use of visualized social media data for enhancing the decision-making process by providing insights on local issues.
The use of social media data became of great importance in the context of the participatory planning approach concept that implies the participation of the entire community in the process of urban management and planning. The followers of this concept argue that planning measures based on the opinion of experts or local administration are unacceptable; whereas a joint decision of the community population, experts and administration is more appropriate (Healey 1999). Such an approach does not only take into account the population interests but also allows to look at how the population interacts with the urban space. This is precisely the main intersection of our research with a participatory planning approach since this study aims to identify not the centers determined by the administration but informal UACs. In our opinion, the informal centers are the ones that play the greatest role in the life of the city's population. They do not often have names and are unknown to the residents of other parts of the city while being very important for the local community, and this could be revealed by social media data better than by almost any other source of information. As some researchers have already shown, social media activity is more appropriate for detecting urban structure compared to other sources of big spatial data because social media activities usually take place when users encounter something new or stay in a certain location for a long time (Kaplan, Haenlein 2010). It is important that the use of this type of data allows to identify the sites with heterogeneous activity (Frias-Martinez et al. 2012), which is a key goal for us in defining UACs. The utilization of social media data as one of the indicators of urban activity is also explained by the fact that the modern population of Japan actively uses mobile communications and social media and tend to report almost about every purchase they make.

Social media data
In this study we use geotagged data from the microblogging service Twitter. This is the most popular social media platform in Japan with approximately 45 million users, which is about 35% of the country's population (TechCrunch 2018). We generated a table of original data using approximately 1 million tweets posted in 2017 covering 23 special wards of Tokyo by putting together the original data on the geographical location, timing and date of messaging. This data set makes it possible to perform the spatial-temporal analysis of the human activity in Tokyo, however, it is necessary to mention that the application of geotagged tweets, and social media data in general, has several limitations and we should take them into account while analyzing and interpreting the results. Moreover, since this data type is relatively new the theoretical background for it is not extensive and some drawbacks may still be undiscovered. Steiger et al. (2015) outlined several limitations of social media data starting with the sampling bias related to the fact that not everyone uses social media and particularly Twitter. This bias leads to the underrepresentation of some social groups in the data and overrepresentation of the other (Heckman 1979), which is undoubtedly relevant for our case. Secondly, the coordinates of tweets may be inaccurate due to atmospheric radiation, mobile device characteristics and the effect of the surrounding environment. In addition, users may choose either to enter a specific location with precise coordinates or to keep it more general, such as city or even country. To avoid the limitations related to this bias we used only tweets with precise coordinates. Thirdly, the text of tweets may not reflect the currently happening activities at all. For instance, the tweets may refer to future or past activities. The latter can significantly affect the interpretation and the functional analysis of the data, however, it does not have any impact on the findings of this study which uses only two parameters of the tweets: time and geo-location.

Points of interest
Social media data is characterized by spatial instability and sensitivity to spontaneous events that may cause inaccuracy in the identification of spatial structure. In order to overcome this problem we used the database of POIs (points of interest) of OpenStreetMap (2017) as an additional data source. POIs are places where most social interactions occur; where people spend most of their free time. Thus, we consider the phenomenon that according to the concept of R. Oldenburg is called «third places» (Ahas et al. 2009;Cai et al. 2016;Schneider et al. 2013), which includes places different from home and work. The activities there represent a crucial part of social interactions and are certainly important for understanding the urban environment (Rosenbaum 2006).
It is necessary to mention that the POI database covers a wider range of facilities than only third places. It also includes a part of 'second places' , such as schools, universities, important administrative buildings, post offices, fire departments, medical institutions. Obviously, there is urban activity at these sites as well, which is why the services provided there should not be excluded from calculations. Although many modern scholars focus on 'third places' as the generators of creative industries, grassroot democracy and special social interactions, we will try to take into account all possible points of activity to ensure completeness of the study, its better qualitative coverage, and delimitation of multifunctional centers. Thus, the data for this study covers both 'hard' and 'soft' components of the urban environment, temporary and permanent ones. The combined use of them, with appropriate processing, will lead to complete and objective results with a much higher level of certainty.

Identification of urban activity centers
We consecutively used two data types for the identification of urban activity centers: an array of geotagged tweets and the database of POIs. The applicability of social media data for the analysis of urban activity was confirmed by the literature review presented above; however, it is hardly arguable that this data type is associated with temporal instability and sensitivity to occasional events. To address this issue we have integrated the process of identifying a second data source, i.e. points of interest. While geotagged tweets may be used to detect areas where people are concentrated during a certain period of time, POIs represent a 'hard' component of the city -commercial and public facilities that attract people. Thus, the integrated use of social media data and POIs allows to take into account both 'soft' and 'hard' spheres of the urban environment.
The process of identifying UACs included four steps starting from the calculation of the density of tweets in 23 special wards of Tokyo. Firstly, it was necessary to choose the observation units. Presumably, the administrative boundaries do not limit human activity and they may either be located in a part of one district or spread between neighboring municipalities. Therefore for analyzing the spatial distribution of tweets it was necessary to create new observation units. In addition, a new grid of observation units to be used for further calculations was essential to avoid the modifiable areal unit problem which occurs when human activities are evaluated based on administrative boundaries (Openshaw, Taylor 1981;Openshaw, 1984). One of the widely used ways to create new observation units is to build a regular grid of polygons, commonly consisting of squares or regular hexagons. The latter geometric figure is preferable for spatial analysis since the edge effect is minimized when using hexagonal grid; they better fit with a curvature surface, and all neighbors of hexagons are identical, which is very important in this particular case (Birch 2007). Taking these benefits into account the territory slightly exceeding the area of 23 special wards of Tokyo was divided into 2328 hexagons with a radius of about 400 m, which is close to the average value of the zip code units.
After choosing the observation territorial units we calculated the density of tweets per cell of the regular hexagonal grid (Fig. 1). It is clear from this map that the overall trend of centrality (higher density in central cells and lower in peripheral) does not allow to identify concentration of tweets that is significant at a local level (local positive extrema) irrespective of whether they are situated in the central or peripheral part of the city. This is exactly what McMillen and Vysokovsky tried to address through the application of their methodologies and spatial modelling. Therefore the goal of the following stages was to clear the density map from the overall trend of centrality and to detect the maxima within the local concentration of tweets not related to the distance from the city center.
One statistical model that potentially allows to achieve these goals is spatial autoregression (SAR). This technique may predict the distribution of a given indicator taking into account spatial autocorrelation (the dependence of a value on the values in neighboring territorial units) and other independent values. Before the calculation of SAR the global Moran's Index was calculated to clarify whether autocorrelation of tweets density really exists in the studied territory. The formula for Moran's Index in the initial form developed by Moran (1950) is as follows: where n is the number of territorial cells, W is the matrix of spatial weights that demonstrates whether cells i and j are neighboring (1 if yes, 0 if no), Zi is the difference between the value of an indicator in a given cell and its average value (same for Zj in a cell j). It is a subject of debate which cells may be considered as neighboring and, in this case, applying the concept of geographical neighborhood we consider all cells with at least one common point to be neighboring. As a result, the Moran's Index slightly exceeded 0.35, while p-value was insignificantly low (2.2e-16) meaning that there is a high degree of spatial correlation of the tweets density.
The next step was to calculate the modelled number of tweets by hexagonal grid based on the global centrality trend and the values in the neighboring cells. The general formula for spatial autoregression is where Y is a vector of dependent and X of independent variables, W is the matrix of weights, b is regression coefficient, p is a scalar autoregressive parameter and μ refers to a vector of regression disturbances (Kelejian, Prucha 1998). P shows the extent of spatial autocorrelation, Y is the modelled value of tweets concentration, while X is the chosen explanatory value, which in this case is the distance from the geographical center of the city. Figure 2 shows the modelled values of tweets concentration that is mathematically explained by the value of neighboring cells and distance from the city center. We introduced the distance from the city center in this model in order to make it possible to eliminate the effect of centrality at the following stage.
The third step was to calculate the difference between the real and modelled distribution of tweets to identify hexagons, the high density in which may not be explained by the overall (global centrality) or in-site (value of neighbors) trends. The aim of this stage is to highlight the local maxima of urban activity -areas where the concentration of tweets is considerably higher than in surroundings, which corresponds with our definition of urban activity centers. The map in figure 3 illustrates the results of this stage, and the local extrema of tweets concentration may be identified on it much easier than on the first map (Fig. 1).
(1) Finally, the last step was to delimit the borders of urban activity centers in the urban environment. It is hardly possible that UACs have an ideal form of regular hexagons due to the presence of such barriers as streets or buildings. That is why at this stage of research we utilized the POIs database supplemented by the street network of Tokyo. We ended up taking hexagons with a relatively high concentration of residual tweets (>5) as potential urban activity centers. The areas where POIs are concentrated were found and delimited manually within their territory taking into account the road network. As a result, we identified and mapped 146 urban activity centers (Fig.  5). Summarizing the identification process, we detected approximate areas of high urban activity by means of social media data analysis and after it clarified the precise location of real urban activity centers through the concentration of 'hard' objects producing those activities -points of interest.

RESULTS
Using the methodology described above we identified 146 urban activity centers within the territory of 23 special wards of Tokyo. Such a large number of objects is almost impossible to describe without applying some kind of grouping. Since some quantitative characteristics of UACs, such as the number of tweets per hexagon and the number of POIs per actual center of urban activity, were already available for calculations we decided to develop the classification of the centers based on one of these indicators. POIs seem to be a better data type for dividing UACs into subgroups according to their scale due to the fact that this data characterize the «hard» component of the city, the real content of the urban space that exists for a relatively long time. In other words, for the following qualitative and spatial analysis we classified urban activity centers according to the number of facilities attracting urban activity, or POIs because they have the potential to represent the scale and heterogeneity of centers. It is important to note that the results of this classification do not represent the final typology. The proposed classification should be looked at as a necessary step for simplification of the further analysis.
As already mentioned, POIs of Tokyo with an overall number of about 12,000 represent services covering almost all spheres of urban life: from cafes and banks to universities and car repair shops. In order to divide the centers of urban activity into classes according to the POI count we examined the distribution of this indicator by UAC (Fig. 4).
A thorough analysis of all these classes took more than 30 pages of the original text but for this article the results are presented in two tables (Table 1 and Table 2). The overall structure of the analysis was as follows: 1) identification of some basic features of the classes such as the number of centers and POIs; 2) description of the main location patterns based on a series of maps derived from figure 5; 3) investigation of the functional structure of each class and its' objects using POIs database, fieldwork experience, and additional sources (including the evaluation of the tourist attractiveness 1 of centers); 4) identification and study of the exemptions for each class.
The urban activity centers of the first class are large objects located in the central part of Tokyo. Their functional structure is characterized by diversification, often with one or two leading categories: tourist facilities, government and public services or food services. Services provided in these centers are often unique and have city-wide importance, so all the centers selected for the first class are undoubtedly tourist-attractive. All of the second-class centers appeared to be secondlevel UACs both in size, scale and in the qualitative component. They are outstanding but not as significant as the leading ones. There is a wide range of services in these centers and they are characterized by high demand among both locals and tourists. For visitors from outside Ikebukuro and the centers around the Meiji Shrine are especially attractive.

Fig. 4. Distribution of POIs by UAC
The vast majority of the third-class UACs are located on the center-west axis. The largest centers of this class geographically and functionally are close to the higher classes, while the centers of the lower subclass have an average set of services that are mostly consumed by the local residents. Almost all centers in this group are located near the average-size railway stations and their emergence seems to be linked to them.
The objects of the fourth class appeared to be the clusters of organizations providing basic services for the residents of surrounding neighborhoods. There is also a high percentage of parking and other services for motorists, which is understandable as many of the centers are located in the peripheral areas, where the stations sometimes provide a starting point for moving on a private vehicle through the suburbs. Zones of increased concentration of foodservice objects are observed near the railway stations, which are the cores of these centers but their significance and share in the overall number of facilities is relatively small. Most of the group's objects lie outside the city center (the Yamanote line).
A typical fifth-class center consists of a couple of blocks where a police station, school, fire station, pharmacy, post office and railway station are located. In addition, stores and cafes are usually concentrated around the station but there are only a few of them. The fieldwork data was very useful for the description of the fifth-class centers since some of small facilities are not even included in the POIs database. Geographically, they are slightly different from the previously analyzed centers: objects of this group are concentrated in the eastern peripheral regions.

CONCLUSION
The results of the research allow us to draw several fundamental conclusions. Firstly, the methodology developed by the authors is applicable to the identification of existing urban activity centers at various scalar levels. The proposed methodology is not conceptually new, it is rather a combination and reinterpretation of Vysokovsky and McMillen ideas. However, for spatio-statistical modeling and identification of local concentrations of urban activity not related to the overall trend of centrality we used conceptually different data type. This represents the main added value of this research. Our work demonstrates that social media data that have an almost unlimited capacity for various urban studies can be used instead of census data and economic statistics in studies analogous to those conducted by McMillen and Vysokovsky. It is a crucial finding taking into account the renewability of social media data and its relevance. Comparison of the map of the  delimitated UACs with the results of field studies, studies of other scholars and a priori knowledge of the urban structure of Tokyo confirm the accuracy of localization of these centers. Consequently, this study once again demonstrates the applicability of big social media data (with appropriate processing using mathematical modelling and adequate interpretation) for studying urban space. We can summarize the main characteristics of the subject of this study -centers of urban activity and the urban structure of Tokyo. In general Tokyo has a radial-ring spatial structure: the first ring of UACs is located around the Imperial Palace and the second -along the Yamanote railway line, which plays a key role in the city's life. Peripheral radiuses of UACs are situated along railway lines of the corresponding directions in almost every sector of the city.
One could observe certain spatial differences when dividing the peripheral parts of the city into sectors. For example, the western sector is characterized by the highest activity density; it contains a large number of high-class diversified UACs, while in the eastern sector objects of the fourth and fifth classes prevail. This is probably because Tokyo prefecture extends to the west of the center, which promotes closer ties and as a result, the western part of the city is more heterogeneous, even at a long distance from the center, while in eastern Tokyo the suburban area is located much closer to the city center. Finally, using the classification of UACs certain relations were identified between their quantitative (number of POIs) and qualitative characteristics. They include functional features and geographic location that may lead to further comparative studies of sectoral differences within Tokyo and to enhancing currently existing models of functional zoning of the city. In addition, the applicability of this work is evident for more practical purposes such as transportation management, community building and urban planning.