import pandas as pd
import seaborn as sns
from prettytable import PrettyTable
Loeme faili sisse:
path = 'Cleaned_DS_Jobs.csv'
df = pd.read_csv(path)
Vaatame, millest koosneb andmestik:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 660 entries, 0 to 659 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Job Title 660 non-null object 1 Salary Estimate 660 non-null object 2 Job Description 660 non-null object 3 Rating 660 non-null float64 4 Company Name 660 non-null object 5 Location 660 non-null object 6 Headquarters 660 non-null object 7 Size 660 non-null object 8 Type of ownership 660 non-null object 9 Industry 660 non-null object 10 Sector 660 non-null object 11 Revenue 660 non-null object 12 min_salary 660 non-null int64 13 max_salary 660 non-null int64 14 avg_salary 660 non-null int64 15 job_state 660 non-null object 16 same_state 660 non-null int64 17 company_age 660 non-null int64 18 python 660 non-null int64 19 excel 660 non-null int64 20 hadoop 660 non-null int64 21 spark 660 non-null int64 22 aws 660 non-null int64 23 tableau 660 non-null int64 24 big_data 660 non-null int64 25 job_simp 660 non-null object 26 seniority 660 non-null object dtypes: float64(1), int64(12), object(14) memory usage: 139.3+ KB
Andmestik koosneb 660 reast, milles kajastub ülaltoodud info nagu Töö nimetus (ingl. k. Job Title), ennustatav töötasu, töökirjeldus, hinnang, ettevõtte nimi jne. Andmed on float64 (1 kirje), int64 (12 kirjet) ja object (14 kirjet) kujul.
Vaatame, missugune andmestik välja näeb meie programmis:
df.head()
Job Title | Salary Estimate | Job Description | Rating | Company Name | Location | Headquarters | Size | Type of ownership | Industry | ... | company_age | python | excel | hadoop | spark | aws | tableau | big_data | job_simp | seniority | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Sr Data Scientist | 137-171 | Description\n\nThe Senior Data Scientist is re... | 3.1 | Healthfirst | New York, NY | New York, NY | 1001 to 5000 employees | Nonprofit Organization | Insurance Carriers | ... | 27 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | data scientist | senior |
1 | Data Scientist | 137-171 | Secure our Nation, Ignite your Future\n\nJoin ... | 4.2 | ManTech | Chantilly, VA | Herndon, VA | 5001 to 10000 employees | Company - Public | Research & Development | ... | 52 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | data scientist | na |
2 | Data Scientist | 137-171 | Overview\n\n\nAnalysis Group is one of the lar... | 3.8 | Analysis Group | Boston, MA | Boston, MA | 1001 to 5000 employees | Private Practice / Firm | Consulting | ... | 39 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | data scientist | na |
3 | Data Scientist | 137-171 | JOB DESCRIPTION:\n\nDo you have a passion for ... | 3.5 | INFICON | Newton, MA | Bad Ragaz, Switzerland | 501 to 1000 employees | Company - Public | Electrical & Electronic Manufacturing | ... | 20 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | data scientist | na |
4 | Data Scientist | 137-171 | Data Scientist\nAffinity Solutions / Marketing... | 2.9 | Affinity Solutions | New York, NY | New York, NY | 51 to 200 employees | Company - Private | Advertising & Marketing | ... | 22 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
5 rows × 27 columns
Kuvame Salary Estimate, Rating, min_salary ja max_salary veerud, sorteerides min_salary järgi:
display(df[['Salary Estimate', 'Rating', 'min_salary', 'max_salary']].sort_values(['min_salary'], ascending=False))
Salary Estimate | Rating | min_salary | max_salary | |
---|---|---|---|---|
499 | 212-331 | 4.1 | 212 | 331 |
519 | 212-331 | 3.5 | 212 | 331 |
501 | 212-331 | 3.6 | 212 | 331 |
502 | 212-331 | 5.0 | 212 | 331 |
503 | 212-331 | 3.5 | 212 | 331 |
... | ... | ... | ... | ... |
460 | 31-56 | 3.9 | 31 | 56 |
459 | 31-56 | 4.0 | 31 | 56 |
458 | 31-56 | 3.3 | 31 | 56 |
470 | 31-56 | 2.9 | 31 | 56 |
469 | 31-56 | 4.1 | 31 | 56 |
660 rows × 4 columns
Sellest tabelist näeme, et Salary Estimate veerg on tuletatud min_salary ja max_salary veergude järgi.
Näitame kõige suuremat palka pakkuvaid ja kõige paremini hinnatuid ettevõtteid, sorteerime Rating'u järgi:
filtered_top=df[(df['avg_salary'] > 200) & (df['Rating'] > 0)].sort_values(['Rating'], ascending=False)
display(filtered_top)
Job Title | Salary Estimate | Job Description | Rating | Company Name | Location | Headquarters | Size | Type of ownership | Industry | ... | company_age | python | excel | hadoop | spark | aws | tableau | big_data | job_simp | seniority | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
502 | Data Scientist(s)/Machine Learning Engineer | 212-331 | Company: AI/Data Science\nLocation: New York C... | 5.0 | Blue Horizon Tek Solutions | New York, NY | Coconut Creek, FL | 1 to 50 employees | Company - Private | Staffing & Outsourcing | ... | 33 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
509 | Data Science Software Engineer | 212-331 | We love programming and the excitement that co... | 4.8 | Klaviyo | Boston, MA | Boston, MA | 201 to 500 employees | Company - Private | Computer Hardware & Software | ... | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | na | na |
504 | Data Scientist | 212-331 | NO OPT CPT pls.\n\nÂ\n\nNeed local toÂBay Area... | 4.7 | Sharpedge Solutions Inc | Seattle, WA | Lombard, IL | Unknown | Company - Private | Publishing | ... | -1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | data scientist | na |
516 | Data Scientist | 212-331 | Job Description\nTitle: Sports Data Scientist\... | 4.5 | Smith Hanley Associates | Washington, DC | New York, 061 | 1 to 50 employees | Company - Private | Staffing & Outsourcing | ... | 40 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
513 | Manager, Field Application Scientist, Southeast | 212-331 | At 10x Genomics, accelerating our understandin... | 4.2 | 10x Genomics | Raleigh, NC | Pleasanton, CA | 501 to 1000 employees | Company - Public | Biotech & Pharmaceuticals | ... | 8 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | manager | na |
499 | Senior Principal Data Scientist (Python/R) | 212-331 | Roche Diagnostics has built a new strategic ar... | 4.1 | Roche | Pleasanton, CA | Basel, Switzerland | 10000+ employees | Company - Public | Biotech & Pharmaceuticals | ... | 124 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | data scientist | senior |
500 | Real World Science, Data Scientist | 212-331 | Title: Real World Science, Data Scientist\nLoc... | 4.0 | AstraZeneca | Wilmington, DE | Cambridge, United Kingdom | 10000+ employees | Company - Public | Biotech & Pharmaceuticals | ... | 107 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | data scientist | na |
511 | Data Scientist | 212-331 | Role: Data ScientistÂ\n\nLocation: Washington,... | 4.0 | Comtech Global Inc | Washington, DC | Columbus, OH | 51 to 200 employees | Company - Private | Health, Beauty, & Fitness | ... | -1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | data scientist | na |
514 | COMPUTER SCIENTIST - ENGINEER - RESEARCH COMPU... | 212-331 | Join the exciting new area of developing the a... | 3.9 | Southwest Research Institute | Dayton, OH | San Antonio, TX | 1001 to 5000 employees | Nonprofit Organization | Research & Development | ... | 73 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | na | na |
507 | ENGINEER - COMPUTER SCIENTIST - RESEARCH COMPU... | 212-331 | Join our Defense and Intelligence Solutions Di... | 3.9 | Southwest Research Institute | Oklahoma City, OK | San Antonio, TX | 1001 to 5000 employees | Nonprofit Organization | Research & Development | ... | 73 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | na | na |
512 | Data Scientist | 212-331 | The work is in support of targeting and watchl... | 3.8 | Aveshka, Inc. | Washington, DC | Arlington, VA | 51 to 200 employees | Company - Public | IT Services | ... | 10 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | data scientist | na |
501 | Data Scientist | 212-331 | Position: Data Scientist\nLocation: Denver\nSt... | 3.6 | Creative Circle | United States | Los Angeles, CA | 201 to 500 employees | Company - Public | Staffing & Outsourcing | ... | 18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
506 | Data Scientist | 212-331 | Ke`aki Technologies is looking for a qualified... | 3.6 | Alaka`ina Foundation Family of Companies | Fort Sam Houston, TX | Honolulu, HI | 501 to 1000 employees | Government | -1 | ... | -1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
517 | Scientist - Machine Learning | 212-331 | Scientist 1 - Machine Learning - Cell Science\... | 3.5 | Allen Institute | Seattle, WA | Seattle, WA | 201 to 500 employees | Nonprofit Organization | Research & Development | ... | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mle | na |
519 | Data Scientist | 212-331 | Aptive is seeking a Data Scientist to compile ... | 3.5 | Aptive | Washington, DC | Alexandria, VA | 51 to 200 employees | Company - Private | Consulting | ... | 8 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | data scientist | na |
505 | Data Scientist | 212-331 | Please review the job details below.\nWe are l... | 3.5 | Maxar Technologies | Herndon, VA | Westminster, CO | 5001 to 10000 employees | Company - Public | Aerospace & Defense | ... | -1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | data scientist | na |
503 | Senior Data Scientist | 212-331 | Do you have a head for numbers? Like turning r... | 3.5 | Maxar Technologies | Herndon, VA | Westminster, CO | 5001 to 10000 employees | Company - Public | Aerospace & Defense | ... | -1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | data scientist | senior |
518 | Data Scientist | 212-331 | Description\n\nThe Data Scientist will be part... | 2.7 | 1-800-Flowers | New York, NY | Carle Place, NY | 1001 to 5000 employees | Company - Public | Wholesale | ... | 44 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | data scientist | na |
508 | Data Scientist | 212-331 | JOB DESCRIPTION:\n\nHexagon US Federal is look... | 2.7 | Hexagon US Federal | Lexington Park, MD | Chantilly, VA | 51 to 200 employees | Company - Public | Aerospace & Defense | ... | 10 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | data scientist | na |
19 rows × 27 columns
Näitame kõige vähem palka pakkuvaid ettevõtteid ja sorteerime Rating'u järgi:
filtered_bottom=df[(df['avg_salary'] < 50) & (df['Rating'] > 0)].sort_values(['Rating'], ascending=False)
display(filtered_bottom)
Job Title | Salary Estimate | Job Description | Rating | Company Name | Location | Headquarters | Size | Type of ownership | Industry | ... | company_age | python | excel | hadoop | spark | aws | tableau | big_data | job_simp | seniority | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
477 | Data Scientist - TS/SCI FSP or CI Required | 31-56 | US Citizenship Required and (TS/SCI with FSP o... | 5.0 | Phoenix Operations Group | Annapolis Junction, MD | Woodbine, MD | 1 to 50 employees | Company - Private | IT Services | ... | 9 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | data scientist | na |
476 | Data Scientist | 31-56 | Join us in making roads safer by making driver... | 4.9 | Cambridge Mobile Telematics | Cambridge, MA | Cambridge, MA | 1 to 50 employees | Company - Private | Enterprise Software & Network Solutions | ... | 10 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | data scientist | na |
471 | Data Scientist | 31-56 | What will you be doing?\n\nThe Contractor shal... | 4.7 | Praxis Engineering | Chantilly, VA | Annapolis Junction, MD | 201 to 500 employees | Subsidiary or Business Segment | Computer Hardware & Software | ... | 18 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | data scientist | na |
467 | Data Scientist | 31-56 | Job Title : Data Scientist\n\nJob Location : W... | 4.5 | Radiant Digital | Washington, DC | New York, NY | 1 to 50 employees | Company - Private | -1 | ... | -1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | data scientist | na |
475 | Decision Scientist | 31-56 | Are you passionate about Decision Science?\n\n... | 4.5 | Johns Hopkins University Applied Physics Labor... | Laurel, MD | Laurel, MD | 5001 to 10000 employees | Nonprofit Organization | Aerospace & Defense | ... | 78 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | na | na |
469 | Data Scientist | 31-56 | Responsibilities·\nAnalyze large datasets to g... | 4.1 | IntelliPro Group Inc. | Santa Clara, CA | Santa Clara, CA | 201 to 500 employees | Company - Private | -1 | ... | -1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | data scientist | na |
464 | Data Scientist | 31-56 | Data Scientist\n\nTrexquant is a systematic he... | 4.0 | Trexquant Investment | United States | Stamford, CT | 1 to 50 employees | Company - Private | Investment Banking & Asset Management | ... | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
459 | Data Scientist | 31-56 | As a member of the Data Science team, this rol... | 4.0 | Grid Dynamics | Santa Clara, CA | San Ramon, CA | 1001 to 5000 employees | Company - Private | Enterprise Software & Network Solutions | ... | 14 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | data scientist | na |
473 | Data Scientist | 31-56 | Job Description\n\n\nJob #: 1068351\n\nOur cli... | 3.9 | Apex Systems | San Francisco, CA | Glen Allen, VA | 1001 to 5000 employees | Subsidiary or Business Segment | Staffing & Outsourcing | ... | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
460 | VP, Data Science | 31-56 | We are looking for a VP, Data Science to lead ... | 3.9 | 7Park Data | New York, NY | New York, NY | 51 to 200 employees | Company - Private | Research & Development | ... | 8 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | na | na |
462 | Data Scientist | 31-56 | The Perduco Group, a LinQuest company, is seek... | 3.9 | LinQuest | Colorado Springs, CO | Los Angeles, CA | 501 to 1000 employees | Company - Private | Aerospace & Defense | ... | 16 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | data scientist | na |
461 | Senior Business Intelligence Analyst | 31-56 | Position Overview:\n\nThe Senior Business Inte... | 3.7 | Protolabs | Maple Plain, MN | Maple Plain, MN | 1001 to 5000 employees | Company - Public | Miscellaneous Manufacturing | ... | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | analyst | senior |
466 | Data Scientist | 31-56 | Job Introduction\n\n\nFLEETCOR is seeking a Da... | 3.7 | Fleetcor | Atlanta, GA | Atlanta, GA | 5001 to 10000 employees | Company - Public | Financial Transaction Processing | ... | 20 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | data scientist | na |
463 | Data Engineer - Kafka | 31-56 | Western Digital®\n\nThe next big thing in\ndat... | 3.5 | Western Digital | San Jose, CA | San Jose, CA | 10000+ employees | Company - Public | Computer Hardware & Software | ... | 50 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | data engineer | na |
472 | Data Scientist | 31-56 | General Description: The Data Scientist will s... | 3.4 | USI | Southfield, MI | Saint Paul, MN | 1001 to 5000 employees | Company - Private | Construction | ... | 22 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
458 | Data & Machine Learning Scientist | 31-56 | Passionate about precision medicine and advanc... | 3.3 | Tempus Labs | Chicago, IL | Chicago, IL | 501 to 1000 employees | Company - Private | Biotech & Pharmaceuticals | ... | 5 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | mle | na |
465 | Data Scientist | 31-56 | Deepen understanding and usage of data across ... | 3.0 | Ameritas Life Insurance Corp | Cincinnati, OH | Lincoln, NE | 1001 to 5000 employees | Company - Private | Insurance Agencies & Brokerages | ... | 133 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | data scientist | na |
474 | Data Scientist | 31-56 | Company overview:\nFor more than three decades... | 2.9 | Pragmatics, Inc. | United States | Reston, VA | 201 to 500 employees | Company - Private | IT Services | ... | 35 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | data scientist | na |
470 | In-Line Inspection Data Analyst | 31-56 | In-Line Inspection Data Analyst\n\nQuest Integ... | 2.9 | Quest Integrity | Tulsa, OK | Kent, WA | 201 to 500 employees | Company - Public | Oil & Gas Services | ... | 24 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | analyst | na |
468 | Data Scientist | 31-56 | Child Care Aware® of America, our nation’s lea... | 2.8 | Child Care Aware of America | Arlington, VA | Arlington, VA | 51 to 200 employees | Nonprofit Organization | Social Assistance | ... | 33 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | data scientist | na |
20 rows × 27 columns
Leiame keskmised hinnangutele erinevates kategooriates:
top_avg_rate=round(filtered_top['Rating'].mean(), 4)
bottom_avg_rate=round(filtered_bottom['Rating'].mean(), 4)
df=df[(df['Rating'] > 0)]
all_avg_rate=round(df['Rating'].mean(), 4)
print(f"Top palgamaksjate keskmine Rating: {top_avg_rate}")
print(f"Väikseimate palgamaksjate keskmine Rating: {bottom_avg_rate}")
print(f"Kõikide ettevõtete keskmine Rating: {all_avg_rate}")
Top palgamaksjate keskmine Rating: 3.8684 Väikseimate palgamaksjate keskmine Rating: 3.83 Kõikide ettevõtete keskmine Rating: 3.8815
Siit võib järldada, et ettevõtte hinnang ei ole seotud palganumbri suurusega, vaid peaks vaatama pigem töökultuuri, ülemuste suhtumist või veel midagi kolmandat üldse.
Uurin, mis suurustega ettevõtted on csv failis.
df.Size.unique()
array(['1001 to 5000 employees', '5001 to 10000 employees', '501 to 1000 employees', '51 to 200 employees', '10000+ employees', '201 to 500 employees', '1 to 50 employees', 'Unknown'], dtype=object)
Eemaldan 'Unknown' väärtustega kirjed ja arvutan välja ettevõtete suuruste järgi hinnangute keskmised:
df2=df[(df['Size'] != 'Unknown')]
df2.Size = df2.Size.str.strip('_ employees')
size_categories=df2.Size.unique()
print('Töötajate arv:')
for i in size_categories:
filtered_size=df2[(df2['Size'] == i)]
rating=round(filtered_size['Rating'].mean(), 4)
print(f"{i}: \n - Keskmine hinnang: {rating}")
Töötajate arv: 1001 to 5000: - Keskmine hinnang: 3.6183 5001 to 10000: - Keskmine hinnang: 3.7279 501 to 1000: - Keskmine hinnang: 3.7325 51 to 200: - Keskmine hinnang: 4.0128 10000+: - Keskmine hinnang: 3.7038 201 to 500: - Keskmine hinnang: 4.0012 1 to 50: - Keskmine hinnang: 4.3535
C:\Users\kaare\AppData\Local\Temp\ipykernel_15840\788928302.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df2.Size = df2.Size.str.strip('_ employees')
Nende andmete järgi meeldib inimestele töödata pigem väikestes ettevõtetes, kus:
1-50 töötajaga ettevõtete keskmine hinnang oli 4.3535;
51-200 ja 201-500 töötajaga ettevõtetel vastavalt 4.0128 ja 4.0012;
501-1000, 1001-5000, 5001-10000 ja 10000+ töötajaga ettevõtetel vastavalt 3.7325, 3.6183, 3.7279, 3.7038.
Kuvan tulemused ka boxplot graafikuga seaborn teeki kasutades.
catplot = sns.catplot(
data=df2,
x="Size", y="Rating",
kind="box"
)
catplot.set_xticklabels(rotation=50)
<seaborn.axisgrid.FacetGrid at 0x21f09d45c90>
Jooniselt on näha, et väiksemates ettevõtetes meeldib inimestele rohkem tööd teha kui suurettevõtetes.
Uurin, kas tegevusvaldkonnas esineb kirjeid, mis tuleks kustutada.
df.Industry.unique()
array(['Insurance Carriers', 'Research & Development', 'Consulting', 'Electrical & Electronic Manufacturing', 'Advertising & Marketing', 'Computer Hardware & Software', 'Biotech & Pharmaceuticals', 'Consumer Electronics & Appliances Stores', 'Enterprise Software & Network Solutions', 'IT Services', 'Energy', 'Chemical Manufacturing', 'Federal Agencies', 'Health Care Services & Hospitals', 'Internet', 'Investment Banking & Asset Management', 'Aerospace & Defense', 'Utilities', '-1', 'Express Delivery Services', 'Staffing & Outsourcing', 'Insurance Agencies & Brokerages', 'Consumer Products Manufacturing', 'Industrial Manufacturing', 'Food & Beverage Manufacturing', 'Banks & Credit Unions', 'Video Games', 'Shipping', 'Telecommunications Services', 'Lending', 'Cable, Internet & Telephone Providers', 'Real Estate', 'Venture Capital & Private Equity', 'Miscellaneous Manufacturing', 'Oil & Gas Services', 'Transportation Equipment Manufacturing', 'Telecommunications Manufacturing', 'Transportation Management', 'News Outlet', 'Architectural & Engineering Services', 'Food & Beverage Stores', 'Other Retail Stores', 'Hotels, Motels, & Resorts', 'State & Regional Agencies', 'Financial Transaction Processing', 'Timber Operations', 'Colleges & Universities', 'Travel Agencies', 'Accounting', 'Logistics & Supply Chain', 'Farm Support Services', 'Social Assistance', 'Construction', 'Department, Clothing, & Shoe Stores', 'Publishing', 'Health, Beauty, & Fitness', 'Wholesale', 'Rail'], dtype=object)
Filtreerime välja ettevõtted, millel valdkonda pole märgitud, ehk on valdkonnaks pandud '-1'.
industry=df[(df['Industry'] != '-1')]
industries=industry.Industry.unique()
display(industries)
array(['Insurance Carriers', 'Research & Development', 'Consulting', 'Electrical & Electronic Manufacturing', 'Advertising & Marketing', 'Computer Hardware & Software', 'Biotech & Pharmaceuticals', 'Consumer Electronics & Appliances Stores', 'Enterprise Software & Network Solutions', 'IT Services', 'Energy', 'Chemical Manufacturing', 'Federal Agencies', 'Health Care Services & Hospitals', 'Internet', 'Investment Banking & Asset Management', 'Aerospace & Defense', 'Utilities', 'Express Delivery Services', 'Staffing & Outsourcing', 'Insurance Agencies & Brokerages', 'Consumer Products Manufacturing', 'Industrial Manufacturing', 'Food & Beverage Manufacturing', 'Banks & Credit Unions', 'Video Games', 'Shipping', 'Telecommunications Services', 'Lending', 'Cable, Internet & Telephone Providers', 'Real Estate', 'Venture Capital & Private Equity', 'Miscellaneous Manufacturing', 'Oil & Gas Services', 'Transportation Equipment Manufacturing', 'Telecommunications Manufacturing', 'Transportation Management', 'News Outlet', 'Architectural & Engineering Services', 'Food & Beverage Stores', 'Other Retail Stores', 'Hotels, Motels, & Resorts', 'State & Regional Agencies', 'Financial Transaction Processing', 'Timber Operations', 'Colleges & Universities', 'Travel Agencies', 'Accounting', 'Logistics & Supply Chain', 'Farm Support Services', 'Social Assistance', 'Construction', 'Department, Clothing, & Shoe Stores', 'Publishing', 'Health, Beauty, & Fitness', 'Wholesale', 'Rail'], dtype=object)
Kuvame boxplot graafikul hinnangud tegevusvaldkonna kaupa.
plot = sns.catplot(data=industry, x="Industry", y="Rating", kind="box", aspect=2)
plot.set_xticklabels(rotation=90)
<seaborn.axisgrid.FacetGrid at 0x21f0a0138d0>
Vastavalt joonisele saame välja filtreerida tegevusvaldkonnad, kus inimestele meeldib rohkem töödata.
result=[]
for i in industries:
filtered_industry=df[(df['Industry'] == i)]
rating=round(filtered_industry['Rating'].mean(), 1)
if rating > 4:
result.append([rating, i])
result.sort(reverse=True)
t = PrettyTable(['Rating', 'Industry'])
for i in result:
t.add_row(i)
print(t)
+--------+-----------------------------------------+ | Rating | Industry | +--------+-----------------------------------------+ | 4.7 | Publishing | | 4.6 | Transportation Equipment Manufacturing | | 4.5 | Video Games | | 4.4 | Venture Capital & Private Equity | | 4.4 | Lending | | 4.3 | Computer Hardware & Software | | 4.2 | Accounting | | 4.1 | Staffing & Outsourcing | | 4.1 | IT Services | | 4.1 | Enterprise Software & Network Solutions | +--------+-----------------------------------------+
Saame välja tuua ka valdkonnad, kus inimestele ei meeldi töödata.
result=[]
for i in industries:
filtered_industry=df[(df['Industry'] == i)]
rating=round(filtered_industry['Rating'].mean(), 1)
if rating < 3.5:
result.append([rating, i])
result.sort()
t = PrettyTable(['Rating', 'Industry'])
for i in result:
t.add_row(i)
print(t)
+--------+---------------------------------------+ | Rating | Industry | +--------+---------------------------------------+ | 2.6 | Wholesale | | 2.8 | Rail | | 2.8 | Social Assistance | | 2.9 | Cable, Internet & Telephone Providers | | 2.9 | Oil & Gas Services | | 3.0 | Colleges & Universities | | 3.1 | Logistics & Supply Chain | | 3.2 | Express Delivery Services | | 3.2 | State & Regional Agencies | | 3.2 | Telecommunications Services | | 3.3 | Insurance Agencies & Brokerages | | 3.3 | News Outlet | | 3.3 | Telecommunications Manufacturing | | 3.3 | Timber Operations | | 3.4 | Chemical Manufacturing | | 3.4 | Construction | | 3.4 | Shipping | +--------+---------------------------------------+
Kuvame boxplot graafikul keskmised töötasud valdkonna kaupa.
plot = sns.catplot(data=industry, x="Industry", y="avg_salary", kind="box", aspect=2)
plot.set_xticklabels(rotation=90)
<seaborn.axisgrid.FacetGrid at 0x21f0b8344d0>