Yannick Kurvers, 8008701
3a)
The network size for each respondent was calculated based on the number of valid closeXY
values they provided, which represent the number of friends they mentioned. On average,
respondents had a network size of 3, meaning most people listed three friends in their social network.
To measure network density, we looked at how connected the respondent's friends were with each
other, using the relationships from the closeXY_R variables. A density of 0 means that none of the
respondent's friends know each other at all, while a density of 1 means that all the friends are super
close and know each other very well.
For a density of 0.5, there are two main ways to interpret this:
1. Partial connections: Half of the friends in the network know each other, but not everyone is
connected.
2. Moderate strength: On average, the connections between friends are neither weak nor super
strong—they're somewhere in the middle.
In our analysis, the average network density was 0.65. This means that most networks are moderately
connected. Friends know each other fairly well, but the connections aren't fully interconnected. This
variation in density shows the different ways people structure their social networks.
3b)
For respondents with a network size of exactly 3, the density values provide useful insights into
how connected their networks are. With 3 individuals in a network, there are 3 possible relationships
that can exist between them (Person 1 ↔ Person 2, Person 1 ↔ Person 3, and Person 2 ↔ Person
3).
The possible density values range from 0 to 1, where:
0: None of the friends know each other.
0.33: One of the three possible relationships exists.
0.67: Two of the three relationships exist.
1: All three relationships are present, meaning the network is fully connected.
In this dataset, 402 respondents reported a network size of 3. Among them, the average density is
0.672, indicating that, on average, 67% of the possible connections between individuals are present in
these networks. The density values range from a minimum of 0 (indicating no connections between
friends) to a maximum of 1 (indicating a fully connected network), showing that the full range of
possible values is observed.
These results confirm that the density measure behaves as expected and accurately reflects the
level of connectedness within respondents' networks of 3 people. This highlights the value of the
density metric in capturing the structure of small social networks.
4a)
To explore the dynamics of the "friends"-network, we calculated the "degree" for each friend in
a respondent’s network. The degree represents the average closeness of each friend to all other
friends, providing a measure of how interconnected the network is. Using these degree values, we
created a correlation table to examine the relationships between network density (density), network
size (netsize_new), and the degree of individual friends.
The results reveal three main observations. First, there is a weak negative correlation (-
0.1130) between network size and density, suggesting that larger networks tend to be less
interconnected. This makes sense, as it becomes harder to maintain fully connected networks as they
grow in size. Second, there is a near-perfect correlation (0.9916) between density and degree. This is
expected because degree and density are mathematically related measures of interconnectedness.
Finally, we found no variation in degree values within networks. This means that all friends within a
network have identical degree values, reflecting perfectly symmetric networks with no distinction
between individual members.
These findings highlight that while network size and density offer meaningful insights into the
structure of networks, the lack of variability in degree values limits their usefulness for analyzing
differences within networks.
Table 1. Correlation table
Variable netsize_new density degree1 degree2 degree3
netsize_new 1.000 -0.113 -0.112 -0.112 -0.112
density -0.113 1.000 0.992 0.992 0.992
degree1 -0.112 0.992 1.000 1.000 1.000
degree2 -0.112 0.992 1.000 1.000 1.000
degree3 -0.112 0.992 1.000 1.000 1.000
1
, Yannick Kurvers, 8008701
This correlation table reinforces the key findings: larger networks are less interconnected,
density and degree are inherently linked, and the symmetry in degree values reflects the uniform
structure of networks in this dataset.
4b)
Centralization was calculated using the standard deviation of the degree values within each
network. This measure reflects the inequality in connections among friends. A low standard deviation
indicates symmetric networks, where all friends have similar levels of connectivity. On the other hand,
a high standard deviation suggests a more centralized network, where one or a few friends are
significantly more connected than others.
The results of the analysis are striking: the standard deviation of degree values is 0 for all
networks. This means that every network is perfectly symmetric, with no variation in how connected
individual friends are. In other words, all friends within a network have the same degree of
connectivity, and no friend is more or less central than others.
The standard deviation is an effective measure of centralization because it captures how much
individual degree values deviate from the average degree in the network. However, in this dataset, the
lack of variation in degree values makes the measure uninformative. It confirms that the networks
analysed have a fundamentally symmetric structure, where no single friend holds a more central role
than others.
5)
The similarity measures between respondents and their friends provide useful insights into their social
networks:
Gender (sex_sim): On average, 61.7% of a respondent's friends share the same gender.
While some respondents have no friends of the same gender (minimum: 0), others have all
friends of the same gender (maximum: 1).
Age (age_sim): Around 39.8% of friends fall within a 5-year age range of the respondent,
indicating lower similarity compared to other characteristics. This is expected, as social
networks often include individuals from a broader range of ages.
Religion (relig_sim): Approximately 75.2% of friends share the same religion as the
respondent. This high level of similarity likely reflects people's tendency to form relationships
with those who share similar religious beliefs.
Education (educ_sim): On average, 68.6% of friends have a comparable level of education to
the respondent, within a margin of two years. This aligns with the idea that educational
background often plays a role in forming social ties.
Kinship (kin_sim): On average, 34.4% of friends are family members. While some
respondents have no family members among their friends (minimum: 0), others have up to
60% of their friends as family (maximum: 0.6).
These findings illustrate logical patterns in social networks, with strong tendencies for similarity in
gender, religion, and education, alongside some variation in age and kinship connections.
6)
We cannot use the standard deviation for categorical variables because it is specifically
designed for continuous data, where numerical values have a meaningful distance between them
(e.g., age or height). Categorical variables, such as gender or religion, do not have a natural numerical
order or measurable distances between categories. For instance, there is no "distance" between being
Catholic or Protestant, or between being male or female. Since the standard deviation relies on the
calculation of the average distance from the mean, it cannot be applied to categorical data without
meaningful numerical distances.
The Index of Qualitative Variation (IQV) is a more suitable measure for heterogeneity in
categorical variables. IQV measures the distribution of observations across categories and provides a
value between 0 and 1. If all observations fall into a single category (e.g., all are male), the IQV is 0,
indicating no diversity. If the observations are evenly distributed across all categories (e.g., equal
numbers of male and female), the IQV reaches 1, indicating maximum diversity.
IQV works well because:
1. It effectively captures the variety in categorical data.
2. It adjusts for the number of categories and their proportions.
3. It provides a clear, normalized score from 0 to 1, which is easy to interpret.
IQV is a meaningful measure for diversity in categorical data, addressing limitations that prevent
the use of standard deviation.
Example 1, three friends, 2 female and 1 male.
2
3a)
The network size for each respondent was calculated based on the number of valid closeXY
values they provided, which represent the number of friends they mentioned. On average,
respondents had a network size of 3, meaning most people listed three friends in their social network.
To measure network density, we looked at how connected the respondent's friends were with each
other, using the relationships from the closeXY_R variables. A density of 0 means that none of the
respondent's friends know each other at all, while a density of 1 means that all the friends are super
close and know each other very well.
For a density of 0.5, there are two main ways to interpret this:
1. Partial connections: Half of the friends in the network know each other, but not everyone is
connected.
2. Moderate strength: On average, the connections between friends are neither weak nor super
strong—they're somewhere in the middle.
In our analysis, the average network density was 0.65. This means that most networks are moderately
connected. Friends know each other fairly well, but the connections aren't fully interconnected. This
variation in density shows the different ways people structure their social networks.
3b)
For respondents with a network size of exactly 3, the density values provide useful insights into
how connected their networks are. With 3 individuals in a network, there are 3 possible relationships
that can exist between them (Person 1 ↔ Person 2, Person 1 ↔ Person 3, and Person 2 ↔ Person
3).
The possible density values range from 0 to 1, where:
0: None of the friends know each other.
0.33: One of the three possible relationships exists.
0.67: Two of the three relationships exist.
1: All three relationships are present, meaning the network is fully connected.
In this dataset, 402 respondents reported a network size of 3. Among them, the average density is
0.672, indicating that, on average, 67% of the possible connections between individuals are present in
these networks. The density values range from a minimum of 0 (indicating no connections between
friends) to a maximum of 1 (indicating a fully connected network), showing that the full range of
possible values is observed.
These results confirm that the density measure behaves as expected and accurately reflects the
level of connectedness within respondents' networks of 3 people. This highlights the value of the
density metric in capturing the structure of small social networks.
4a)
To explore the dynamics of the "friends"-network, we calculated the "degree" for each friend in
a respondent’s network. The degree represents the average closeness of each friend to all other
friends, providing a measure of how interconnected the network is. Using these degree values, we
created a correlation table to examine the relationships between network density (density), network
size (netsize_new), and the degree of individual friends.
The results reveal three main observations. First, there is a weak negative correlation (-
0.1130) between network size and density, suggesting that larger networks tend to be less
interconnected. This makes sense, as it becomes harder to maintain fully connected networks as they
grow in size. Second, there is a near-perfect correlation (0.9916) between density and degree. This is
expected because degree and density are mathematically related measures of interconnectedness.
Finally, we found no variation in degree values within networks. This means that all friends within a
network have identical degree values, reflecting perfectly symmetric networks with no distinction
between individual members.
These findings highlight that while network size and density offer meaningful insights into the
structure of networks, the lack of variability in degree values limits their usefulness for analyzing
differences within networks.
Table 1. Correlation table
Variable netsize_new density degree1 degree2 degree3
netsize_new 1.000 -0.113 -0.112 -0.112 -0.112
density -0.113 1.000 0.992 0.992 0.992
degree1 -0.112 0.992 1.000 1.000 1.000
degree2 -0.112 0.992 1.000 1.000 1.000
degree3 -0.112 0.992 1.000 1.000 1.000
1
, Yannick Kurvers, 8008701
This correlation table reinforces the key findings: larger networks are less interconnected,
density and degree are inherently linked, and the symmetry in degree values reflects the uniform
structure of networks in this dataset.
4b)
Centralization was calculated using the standard deviation of the degree values within each
network. This measure reflects the inequality in connections among friends. A low standard deviation
indicates symmetric networks, where all friends have similar levels of connectivity. On the other hand,
a high standard deviation suggests a more centralized network, where one or a few friends are
significantly more connected than others.
The results of the analysis are striking: the standard deviation of degree values is 0 for all
networks. This means that every network is perfectly symmetric, with no variation in how connected
individual friends are. In other words, all friends within a network have the same degree of
connectivity, and no friend is more or less central than others.
The standard deviation is an effective measure of centralization because it captures how much
individual degree values deviate from the average degree in the network. However, in this dataset, the
lack of variation in degree values makes the measure uninformative. It confirms that the networks
analysed have a fundamentally symmetric structure, where no single friend holds a more central role
than others.
5)
The similarity measures between respondents and their friends provide useful insights into their social
networks:
Gender (sex_sim): On average, 61.7% of a respondent's friends share the same gender.
While some respondents have no friends of the same gender (minimum: 0), others have all
friends of the same gender (maximum: 1).
Age (age_sim): Around 39.8% of friends fall within a 5-year age range of the respondent,
indicating lower similarity compared to other characteristics. This is expected, as social
networks often include individuals from a broader range of ages.
Religion (relig_sim): Approximately 75.2% of friends share the same religion as the
respondent. This high level of similarity likely reflects people's tendency to form relationships
with those who share similar religious beliefs.
Education (educ_sim): On average, 68.6% of friends have a comparable level of education to
the respondent, within a margin of two years. This aligns with the idea that educational
background often plays a role in forming social ties.
Kinship (kin_sim): On average, 34.4% of friends are family members. While some
respondents have no family members among their friends (minimum: 0), others have up to
60% of their friends as family (maximum: 0.6).
These findings illustrate logical patterns in social networks, with strong tendencies for similarity in
gender, religion, and education, alongside some variation in age and kinship connections.
6)
We cannot use the standard deviation for categorical variables because it is specifically
designed for continuous data, where numerical values have a meaningful distance between them
(e.g., age or height). Categorical variables, such as gender or religion, do not have a natural numerical
order or measurable distances between categories. For instance, there is no "distance" between being
Catholic or Protestant, or between being male or female. Since the standard deviation relies on the
calculation of the average distance from the mean, it cannot be applied to categorical data without
meaningful numerical distances.
The Index of Qualitative Variation (IQV) is a more suitable measure for heterogeneity in
categorical variables. IQV measures the distribution of observations across categories and provides a
value between 0 and 1. If all observations fall into a single category (e.g., all are male), the IQV is 0,
indicating no diversity. If the observations are evenly distributed across all categories (e.g., equal
numbers of male and female), the IQV reaches 1, indicating maximum diversity.
IQV works well because:
1. It effectively captures the variety in categorical data.
2. It adjusts for the number of categories and their proportions.
3. It provides a clear, normalized score from 0 to 1, which is easy to interpret.
IQV is a meaningful measure for diversity in categorical data, addressing limitations that prevent
the use of standard deviation.
Example 1, three friends, 2 female and 1 male.
2