Any Data Scientists/Analysts out there?

HGoat

Well-Known Member
Dec 18, 2014
1,229
709
63
Denver, Colorado
Any of my fellow cyclones in this field?

I'm currently working as a research scientist in the pharmaceutical field. I don't run any of the analysis or statistics of our data sets on our research studies. We have our own statistics department and I'm just in charge of the "in life" phase of the studies. But because of the amount of time I spend with these data sets, I have a lot of interest in biostatistics and data science.

Further down the line, I think this may be something I'd like to pursue for a career. While I have limited experience with SAS, R, Python, etc., I think that I have a lot of the foundational skills to succeed as a biostatistician/data scientist. While I find the ability to mine large data sets to produce meaningful insight interesting, I am mostly allured by the lifestyle.. I currently have to work weekends about once a month and have many 10-12 hour days. Our biostatisticians work remotely, have very flexible schedules and many travel an amount that is just not possible because of the nature of my current job.

In a couple of years if I leveraged my experience and education correctly, I think I could get my company or another company to take a chance on me as a biostatisctician/data scientist. As I mentioned earlier, I think I have most of the foundational skills(big picture problem solving, strong quantitative abilities and some experience with statistical analysis). My concern would be that I don't have experience with the industry data tools and techniques that would allow me to come in and be productive on day one of a new job as a data scientist. I know that the industry is quickly evolving, which can also make things intimidating. I'm a little concerned that if I start to learn SAS or R, by the time I'm adequate at it, it will have became obsolete.

So, to those of you who are in the field:

How would you recommend getting my feet wet with SAS, R, Python, SQL, etc? Which tools are growing market share and which are becoming more obsolete? General advice?

I realize there is a lot to unpack here..much appreciated!
 

EarthIsMan

Well-Known Member
SuperFanatic
SuperFanatic T2
Nov 23, 2014
637
1,125
93
Earth
I think you are spot on with projecting the value these skills having flexible job offerings and general applicability to problem solving in all disciplines.

IMO if data science is your goal then learning R is the way to go. Being that R is open source and highly documented, the program is really powered through its active and large user community. There are so many packages complimentary programs. The data visualization is very powerful too. SAS is becoming more obsolete.

There are a lot of free resources to get started in R too: video tutorials, free books, blogs, and strong communities via stack exchange and others. There are good Twitter follows to keep up to date and in the know too.

Also, if you have strong foundation in statistics and math I don’t think formal classes are completely necessary. With so many free resources available and how dynamic the field is most classes are antiquated. You just need to dedicate your time to learning the language getting familiar with the programs available.
 
  • Like
Reactions: HGoat

Sigmapolis

Minister of Economy
SuperFanatic
SuperFanatic T2
Aug 10, 2011
25,080
37,223
113
Waukee
If I had to rank them...

(1.) Get really dang good at Excel first, including VBA. It is the most generally applicable even if not a "fetish" of the data science community.

(2.) Which advanced language is becoming popular depends a lot on the field. In my field, R is rather dominant, so that is where I have went with it.

(3.) Do not underestimate how useful learning something involving graphics/visualization might be. There are a lot of "box jockies" that can spit out numbers and statistical analysis. There are few people than can make it all look good and speak English.

Getting good at something like Tableau is helpful.

You can quickly have a comparative advantage in visualization and presentation.
 
  • Winner
  • Like
Reactions: HGoat and cyputz

besserheimerphat

Well-Known Member
Apr 11, 2006
10,402
12,801
113
Mount Vernon, WA
Stackoverflow.com and Rseek.org are good places to go when you have specific questions. To program in R I like to use RStudio. Working just in R can be tough because you can't see how your variables are changing unless you display them between every calculation.
 

Goothrey

Well-Known Member
May 5, 2009
4,864
594
113
Dayton via Austin
Not in the field, but I have dabbled a bit and have rather extensive Python experience on both Windows and Ubuntu (I develop computer vision software for work). I switched from Matlab to Python when my access, after graduating, disappeared (licenses are expensive).

In Python, relevant libraries for data science would be NumPy, Pandas, matplotlib and sci-kit-learn. If you want to go deep into data modeling, perhaps TensorFlow and Keras would be helpful as well. The beauty of Python, though it may not always be the fastest option, is that it is flexible and robust. You can do just about anything with it from web development, interfacing with micro-controllers and anything relevant to data. Lots of ports from other languages like C++.

Look into Anaconda:

Anaconda is a freemium open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.

If going the Python route, I recommend going with version 3+ instead of 2.7.x. It's best to get/stick with the times.

Check out this 2018 survey from stackoverflow: https://insights.stackoverflow.com/survey/2018
Perhaps that may give some insight to what developers are using and thinking.

And a few months ago, I believe there was talk of Microsoft possibly adding Python as an official scripting language for Excel. Can't speak much to that though.
 
  • Like
Reactions: HGoat

CloneIce

Well-Known Member
Apr 11, 2006
36,711
19,639
113
Any of my fellow cyclones in this field?

I'm currently working as a research scientist in the pharmaceutical field. I don't run any of the analysis or statistics of our data sets on our research studies. We have our own statistics department and I'm just in charge of the "in life" phase of the studies. But because of the amount of time I spend with these data sets, I have a lot of interest in biostatistics and data science.

Further down the line, I think this may be something I'd like to pursue for a career. While I have limited experience with SAS, R, Python, etc., I think that I have a lot of the foundational skills to succeed as a biostatistician/data scientist. While I find the ability to mine large data sets to produce meaningful insight interesting, I am mostly allured by the lifestyle.. I currently have to work weekends about once a month and have many 10-12 hour days. Our biostatisticians work remotely, have very flexible schedules and many travel an amount that is just not possible because of the nature of my current job.

In a couple of years if I leveraged my experience and education correctly, I think I could get my company or another company to take a chance on me as a biostatisctician/data scientist. As I mentioned earlier, I think I have most of the foundational skills(big picture problem solving, strong quantitative abilities and some experience with statistical analysis). My concern would be that I don't have experience with the industry data tools and techniques that would allow me to come in and be productive on day one of a new job as a data scientist. I know that the industry is quickly evolving, which can also make things intimidating. I'm a little concerned that if I start to learn SAS or R, by the time I'm adequate at it, it will have became obsolete.

So, to those of you who are in the field:

How would you recommend getting my feet wet with SAS, R, Python, SQL, etc? Which tools are growing market share and which are becoming more obsolete? General advice?

I realize there is a lot to unpack here..much appreciated!

I’m not in the same field you are, but I utilize the same type of programmers in my engineering team and have a couple on staff who are a huge part of what we do and are indespensable to me, and we are always looking for more talentented people around the country. My people use a lot of SQL,and probably some of the others you describe. I admit I don’t know all the technical details myself.

My first piece of advice is that if you can be a strong businesss systems analyst (or whatever name you call it) don’t necessarily limit yourself to one field unless you are sure that is the route you want to go.
 

HGoat

Well-Known Member
Dec 18, 2014
1,229
709
63
Denver, Colorado
My first piece of advice is that if you can be a strong businesss systems analyst (or whatever name you call it) don’t necessarily limit yourself to one field unless you are sure that is the route you want to go.

Absolutely. That is part of what makes the field attractive to me!
 

Agkistrodon

Active Member
Feb 14, 2009
392
108
43
41
I'm a bioinformatician and work for a government agency within the Department of the Interior.

My recommendations:
  • First and foremost become familiar with Linux if you are not already. Nearly all big data projects are run on some kind of UNIX based system, whether it is some flavor of Linux or Mac OS X. There are also a lot of simple things you can accomplish at the command line to manipulate files or generate quick summaries that will make you very efficient at processing large amounts of data.
  • R is still commonplace for biostats and data processing, but Python is gaining on it. It is very good to know both, but if you include ever wanting to work with genetic data under the heading of "biostats" then I suggest diving deep into Python. I personally don't think R is very good for dealing with DNA sequence data and actively try to avoid using it, but it is great for other applications. I'll also echo what others have said about learning Python v3.x over v2.x
  • I don't use SQL as much as I could, but it's a good skill to have.
  • Depending upon your goals, maybe gain familiarity with some additional languages such as Perl. I still use it a lot, partly because I am more familiar with it than Python, but if you want to dive into bioinformatics you may find some places with an old code base of [probably poorly written] perl code.
  • If you have time, try finding lectures from someone's Algorithms course online. Some algorithms courses are more conceptual while others focus more on math, so depending upon your experiences one version may be more useful than the other. This was particularly helpful for me because it gave me a better understanding of what was going on behind the scenes of the code I write, and helped me to think about how to do things more efficiently.
 
  • Like
Reactions: HGoat

Cloned

Active Member
SuperFanatic T2
Oct 13, 2009
207
84
28
I'm a bioinformatician and work for a government agency within the Department of the Interior.

There seems to be an enormous demand for bioinformaticians, and it seems especially hard to compete for them in the public sector. Government IT restrictions can really slow down a field that capitalizes on open source software, pipelines, etc.
 
  • Like
Reactions: HGoat

Agkistrodon

Active Member
Feb 14, 2009
392
108
43
41
There seems to be an enormous demand for bioinformaticians, and it seems especially hard to compete for them in the public sector. Government IT restrictions can really slow down a field that capitalizes on open source software, pipelines, etc.
Yeah, it's frustrating sometimes, but on the other hand I'm one of the few people in my facility who will be granted teleworking privileges from time to time.
 
  • Like
Reactions: HGoat

RyCy04

Well-Known Member
Sep 26, 2007
2,697
642
113
Omaha, NE
I taught myself SQL about 12 years ago or so. The best thing to do is get a beginner's book and start writing practice queries. If you have access to actual data, it is even better.

I bought books and read for quite awhile but none of it really clicked until I got into actual data that my employer had and started writing queries. After you get the very basics down, Google is your best friend for learning about more complicated queries.

My very first book was "SQL in 10 Minutes" or something similar to that. A very basic book that just teaches you the simplest types of queries.
 
  • Like
Reactions: HGoat

CtownCyclone

Really Strong Cardinals
SuperFanatic
SuperFanatic T2
Jan 20, 2010
16,540
8,769
113
Where they love the governor
My wife's looking to hire one (I think). I think her lab uses R and Python most of the time. She's a bit more of a taskmaster, though, you wouldn't get the whole "work remotely" thing out of her. And she's the type to send out work emails from bed at 11:30 at night.
 
  • Like
Reactions: HGoat