Evaluation
of the Optimal Location for Erecting a Mexican Restaurant in the City of Madrid,
Spain.
by
In
partial fulfillment for the award of Specialization certificate &
Professional badge in Data science from IBM
Date: June 24, 2019
Type:
Peer Review
Table of
Content
1.1.
Problem
Description..............................................................................................................
3
1.2. Data
Presentation...................................................................................................................
3
1.3. Target
Audience.....................................................................................................................
3
3.1.
Methodology..........................................................................................................................
4
3.2.
Results……............................................................................................................................
9
3.3.
Discussion..............................................................................................................................
9
4.
Conclusions..............................................................................................................................
10
1.1. Problem Description
In this project, the problem attempted to solve
will be to find the best possible location or the most optimal, for a Mexican
restaurant in the city of Madrid, Spain. To achieve this task, an analytical
approach will be used, based on advanced machine learning techniques and data
analysis, concretely clustering and perhaps some data visualization techniques.
During
the process of analysis, several data transformations will be performed, in
order the find the best possible data format for the machine learning model to
ingest. Once the data is set up and prepared, a modeling process will be
carried out, and this statistical analysis will provide the best possible
places to locate the Mexican restaurant.
1.2. Data Presentation
The data that will be used to develop this project is based on two
sites:
1. The Foursquare API: This data will be accesed
via Python, and used to obtain the most common venues per neighborhood in the
city of Madrid. This way, it is possible to have a taste of how the city's venues
are distributed, what are the most common places for leisure, and in general,
it will provide an idea of what people's likes are.
2. The Madrid City Hall's Web Portal: This site
provides several data sources of great utility to solve this problem. The files
are provided in Excel format, and they are built over a statstical exploitation
and use basis. The data contains updated information about the inmigrant
population per country and per nationality. This data will be analyzed in such
a way that one could determine the best location of r anew
venue/restaurant/other based on people's nationalities. For the sake of
simplicity, it will be assumed for this exercise that people's likes varies
according to their nationality, and that people from one specific country will
be more attracted to place that matches the environment and culture of their
own countries, rather than the ones from foreign countries.
You can access the data by clicking this link
1.3. Target Audience
The
target audience of this project could be any business owner that is planning to
open a new business local, restaurant, real state agency, shops, etc... Since
this approach could be aplicable not only to mexican food restaurant but to
other kind of businesses, anybody who is considering to place a new business
local or even relocate it, could beneficiate of this project's approach.
2.1. Methodology
The methodology used to approach this problem includes some
statistical exploration of the data and some visualizations. The main machine
learning technique involved in the development of this project is clustering,
in concrete the K-Means algorithm was used, implemented with Python.
At a first
moment, the main problem was how to obtain the necessary data to build a
constructive approach to the problem to be tackled. Usually, to solve these
kinds of optimal business location problems, a lot of consumer’s data are
needed, but for this example and for the sake of simplicity, the focus was put
mainly on the population’s nationality. A study was carried out over the
inhabitants of Madrid, and it was assumed for this example that the national
population from a certain country would prefer restaurants based on their
national country and food, rather than restaurants from other countries or that
have nothing to do with the culture of their countries, specially when it comes
to immigrant populations, that are not in their countries, and certainly would
like to usually have a taste of their food and original culture. Because in the
end, it is not only about the food, it is also about having a piece of the
country in question. When a someone enters in an Italian restaurant, or
American, or Peruvian restaurant, they are not only consuming the food and
culinary specialties of the country in question, but also the culture, the
people, the music, the decoration. All of this must make people feel like they
were there on the country. With all this being considered, it was decided that
the main goal to efficiently solve this problem, was firstly to define what our
target population is, and secondly, find the areas where this population is
living, and finally, examine the venues and restaurants in this area to see if
our product could work.
Here is an
example of the data used:
This data
contains information about the quantities of immigrant populations in Madrid
inside each Neighborhood. The main features are the country of precedence,
which P á g i n a 4 | 10 indicates where the people of that lives in those
neighborhoods come from. It contains also the quantities of people by country
living in each neighborhood. So, with this, it is already possible to have an
idea of where is our target population located. In this project, the idea is to
open a Mexican restaurant in the city. With further analysis, this question
will be answered. Nevertheless, this task could not be achieved only working
with this raw data. It was also needed to obtain information about the most
common venues in these neighborhoods, besides of the population kind that was
inhabiting on the different neighborhoods. It was also needed to determine
somehow in what measure these neighborhoods were different or similar between
them. To continue this line, The Foursquare API was used to obtain the needed
data about the venues in each neighborhood, but to use the Foursquare API, it
was first necessary to transform the raw data to something the Foursquare API
was capable to handle. Basically, the coordinates of each neighborhood were
needed.
This is an example of the transformed data:
Once the
data was transformed into a format ingestible by the Foursquare API, the
information about the venues could be obtained. The neighborhoods were then
plotted into a map of Madrid, so it was possible to have an idea of their
geographical situation:
The next
step was to obtain the nearby venues by neighborhood, together with their
respective coordinates:
Looking at
this sample, it is possible to see the names of the venues, their coordinates,
and the category of each venue. The results are ordered by neighborhood. This
is a vital step in the segmentation process, since all the important data about
the venues is obtained from here. Once the venues per neighborhood were
obtained, it was then needed to look at the mean occurrence of each venue by
neighborhood:
This
process is progressive, once a piece of information is obtained, it is possible
to go for the next one. With this data in hands, now the segmentation can be
made, and the clusters created. But first it is necessary to determine somehow,
what the appropriate number of clusters is. To perform this task, the elbow
method was used. This method consists in plotting a hypothetical and usually
large number of clusters in our data, and draw a curve representing the squared
distances between each cluster. At some point, the distances will descend to a
point where there is no need to keep increasing them. This means that creating
more divisions in the data (clusters) is pointless as the difference between
groups starts being highly difficult to appreciate:
This is
our curve above. The distances start reducing importantly from cluster 5 on.
So, it was determined that the optimal number of clusters for this problem was
5. With this being done, it is possibly to build the clusters now and have a
look at them:
This are
the 5 clusters on the map of Madrid, it is possible to see how many neighborhoods
belong to each cluster, which is also important information. Now it is possible
to examine the data of each cluster:
Cluster 1:
Cluster 2:
Cluster 3:
Cluster 4:
Cluster 5:
So, this
kind of approach, allow us to perform an analysis of an entire city by looking
at its venues and population. With this information, observations and
conclusions can be made now.
3.2.
Results
The
results obtained were five clusters of very different population and venues
distribution. The following is a description of the clusters:
• Cluster
One: Occupied by Bulgarians and the most common venue is the seafood
restaurant.
• Cluster
Two: Mostly inhabited by south
Americans, Europeans, and North Americans. The most common venues are tapas
restaurants, Argentinian restaurants, pizza places, supermarkets and Spanish
restaurants, among many others.
• Cluster
Three: This cluster is composed only by 3 different population kinds: Americans
Ukrainian people and Dominican Republic people. The most common venues are Pizza
place, gym, shopping mall, church and bakeries etc.
• Cluster
Four: This cluster is only composed by Bangladeshi people. The most common
places are Spanish restaurant, falafel restaurants, fish markets, fast food
restaurants and electronic stores.
• Cluster
Five: Again, only people from Ecuador seems to live in this cluster. The most
common venues are soccer fields, burger joint, plaza, fast food restaurants.
3.3.
Discussions
It is
interesting how the venues and people from different countries varies to one
cluster to another. The main differentiation is located on these two variables.
Each cluster has its own characteristics, but also common spots with other
clusters. If we examine with more detail these results, some conclusions can be
made. As a recommendation, it must be said in a study of this size, to make
good predictions about where to open a certain business or shop, more data is
needed. For example, socio-demographic data about the population, like their
income level, if they have children or not, the education level, what kind of
job do they make a living from, etc.… Also, one of the most important data to
examine carefully are the data related to the people’s likes and tastes about
how they prefer to spend their leisure time, what kinds of food do they like,
or what are their hobbies. With all these data gathered, a more indepth
analysis could be performed, and the segmentations would be more accurate. For
this project, these data weren’t available, and was also out of the project’s
scope.
4.
Conclusion
As far as we are able to see with this
data, there are no mexican populations registered in Madrid. However, in
Cluster 2, it is possible to notice that there's a mexican restaurant located
in the "Centro" neighborhood, which is the town center.
If a deeper exam is performed
into this cluster, it is noticeable that its the living population are mostly
latinos, mixed with some other europeans, but mainly, the people living in this
cluster come from south american countries. Apart of this fact, other kinds of
latin restaurants can be found, like argetinian restaurants, tapas restaurants,
and italian restaurants. So it is possible to tell that the inhabitants of this
area likes these kinds of food.
By following this logic, if we
would like to open a new mexican restaurant in the city or any kind of
restaurant in fact, it would only be necessary to find a where are the
restaurants similar the the one we want to open, study the population in that
area, and find similar clusters of population in the city that don't have yet
or have very few resturants like the one we would like to open.
In this example, clusters 4 and
5 could make a good match for our target population. Looking at the venues in
these clusters, it is possible to find one mexican restaurant, and a good bunch
of fast food, argentinian, and south american restaurants. So, in these
clusters, it is possible to state that the existing restaurants matches the
population's nationalities and tastes.
In conlussion, and taking into
consideration the explanations given above as well as the data, it is highly
possible that clusters 4 and five could be a good place to open our mexican
restaurants. As explained above, the same logic could apply to oopen other kind
of restaurant or business in any other area of the city. It is only necessary
to to examine the the existing businesses in our target area, and study the
population, then compare these 2 factors with the same ones in areas where
there are existing business like the one we want to open, and then verify if
the matching is correct
warrisdisone
ReplyDeleteask reginald oooo
Delete