Statistics for Information Science Final Project
Used R to recode and analyze relationships between variables on the 2018 OSMI Mental Health in Tech Survey (MS Excel) and create a regression model
The connection between age, company size, country a person lives in, amount of support they believe the tech industry gives and willingness to share mental illness with friends and family
Mental health is something becoming increasingly important and prevalent today. An important step in helping people with mental health is for them to reach out to someone for support. This memo will address the connection between a person’s age, company size, country a person lives in, amount of support they believe the tech industry gives and willingness to share their mental illness with a friend or family member. I hypothesize that the IVs will be a better fit to the data than the intercept-only model. This report will show that relationship to be true.
To investigate this question, I used the 2018 OSMI Mental Health in Tech Survey. 417 people who work for tech companies were asked, “How willing would you be to share with friends and family that you have a mental illness?”. Responses were 1-10. This corresponds to the “willingness to share” or “share” variable used in this memo. The dataset variable corresponding to the question “what country do you live in?” or ”country” was recoded to only include the top 5 response countries of “United States”, “United Kingdom”, “Germany”, “India”, and “Canada”. After recoding this subset, there were 353 people in the study. Participants were asked the size of their company with 5 categories: 1-5 employees, 6-25 employees, 26-100 employees, 100-500 employees, 500-1000 employees, or over 1000 employees. The participants were asked on a scale of 1-5 “How well do you think the tech industry supports employees with mental health issue” which corresponds to the “support” variable. The survey also asked the participants age, which corresponds to the “age” variable used in this memo. ​​​​​​​
The plot passed some assumptions but violated a few. The regression model equation is:
Willingness to share = 5.748 - Germany(0.267) + India(0.2459) + UK(0.253) + USA(1.018) + SupportGiven(0.50) - age(0.002) - Size1-5 Employees (0.740) - Size6-25Employees(0.562) + Size26-100 Employees (0.069) + Size100-500 Employees (0.609) + Size500-1000 Employees (0.138) - Size1000 Employees (0.103) + 2.679.
For all the following variables we hold all else constant: If a person is from Germany, this corresponds to an average decrease of 0.267 on the scale of willingness to share. If a person is from India, this corresponds to an average increase of 0.246 on the scale of willingness to share. If a person is from the UK, this corresponds to an average increase of 0.253 on the scale of willingness to share. If a person is from the US, this corresponds to an average increase of 1.018 on the scale of willingness to share. The only country has a meaningful coefficient is the US due to the fact that country is using a dummy variable and over a 1-point change on a 10-point variable is significant. Each additional number increase on the scale of participants perception on how well the tech industry supports employees with mental health corresponds to an average increase of 0.5 on the scale of willingness to share. This is a meaningful coefficient because it is a numerical value of 0.5-point change in DV with every 1-point change in IV. Each additional year of age corresponds to an average decrease of 0.002 on the scale of willingness to share. This is not meaningful due to the small coefficient value. If a person is in an organization with 1-5 employees, this corresponds to an average decrease of 0.74 on the scale of willingness to share. If a person is in an organization with 6-25 employees, this corresponds to an average decrease of 0.562 on the scale of willingness to share. If a person is in an organization with 26-100 employees, this corresponds to an average increase of 0.069 on the scale of willingness to share. If a person is in an organization with 100-500 employees, this corresponds to an average decrease of 0.609 on the scale of willingness to share. If a person is in an organization with 500-1000 employees, this corresponds to an average increase of 0.138 on the scale of willingness to share. If a person is in an organization more than 1000 employees, this corresponds to an average decrease of 0.103 on the scale of willingness to share. The only coefficient that may be meaningful in company size is 1-5 employees due to the fact that it is the highest at 0.74 and makes somewhat of a difference on a 10 point scale.
With an alpha of 0.10, the only variable that was statistically significant is support. With a p-value of 0.08 our entire model is statistically significant. Therefore, we are able to reject our null hypothesis and generalize this finding to the target population of people working in tech companies. R2 = 0.02133 which means there is a 2.13% variance in the willingness to share mental illness with a friend or family that can be explained by the change in the age, company size, country, and perception of support.
However, this study has its limitations. The survey was completed by 417 employees.  These are only a finite group of people in the tech industry. The study itself did violate some assumptions for regression as well. These results could be bias when applying them to the world as a whole since the majority of participants were from the United States and representation of other countries is not equal. In the future, to correct this more people outside of the US could be polled.

You may also like

Back to Top