Another important part of the Data Science process is statistical testing and data visualization. While this might be a responsibility primarily attributed towards more analytical-type Data Scientists, these visualizations often can help machine-learning engineers to understand and work with their data better. Visualizations are a great method of study because they can convey an idea qualitatively. For example, we could say that the data is very spread out by seeing that demonstrated in a visualization. While the variance might be a quantitative measurement, visualizing how our data is spread out will give us qualitative information about our data that will be very valuable.
For hypothesis testing, you should take a look at HypothesisTests.jl and t.jl. These will both come in handy for engineering and testing with data using the T distribution. However, hypothesis tests includes a lot more distribution options such as the F distribution which might be helpful in some situations.
The last module you could look at for doing statistics is a package we discussed prior, Lathe. Lathe comes with a distributions package that adheres to the object-oriented programming paradigm as does the rest of the packages. Test classes are built using distributions and then tests can easily be performed repetitively. Furthermore, these distributions and statistical weights are re-applied within Lathe and can be used along with models.
As for visualization, Julia actually does have a significant number of packages for data visualization. This includes statistical, economical, and business visualizations with and without interactivity or animation. Here are some of the best options in my opinion for data visualization in Julia:
- Makie.jl
- Gadfly.jl
- Plots.jl
- VegaLite.jl
If you’d like to learn more, and consider which visualization library you would like to look at today, I wrote an article detailing the advantages and disadvantages of each of these fantastic libraries (with the exception of Makie.jl) that you can check out here!:
The machine-learning portion of the Julian ecosystem for Data Science has actually seen quite progressive development over the past four years. That being said, while many of these packages might be very different to the packages scientists might be used to working with, they are certainly great implementations. The first package I would look at for machine-learning in Julia is GLM.jl.
GLM is short for Generalized Linear Models. Linear modeling is a great way to practice modeling in a new programming language, and the high-level interface of GLM will certainly accommodate linear modeling to a very advanced degree.
If you are coming from Python and Sklearn, however, you might be interested in a package called Lathe. Lathe.jl is a pure-Julian library for statistics, data processing, with an ever-expanding library of machine-learning models on top. What’s great about Lathe is that it allows users with no practical Julia programming experience to quickly enter the language and start playing around with full models just as they would in Python. If you would like to take a look into Lathe, you can check out the lathe website:
MLJ.jl can also come in handy a lot. Like Lathe, it is rather inclusive and does come with some black-box models. However, while Lathe has more of a focus on being a general-purpose Data Science package, MLJ.jl is much more focused exclusively on machine-learning.
Last, but certainly not least in the options I would recommend is Flux.jl. Although there are some other solutions you could certainly use like Knet, I think Flux.jl takes the cake on deep learning in Julia — at least for now. Flux uses a simple functional approach to solving the classic problem of deep-learning types and methods. That being said, while Flux might be great for people who are used to functional programming, it could take some getting used to for people coming from something like Tensorflow. In my opinion, Flux.jl is actually a lot easier to use, but its immaturity compared to the Google developed Tensorflow certainly does show from time to time.
Thank you for reading my article! I think it is important to talk about what is in the future for Julia. I think the future looks rather bright for the young programming language, so early adoption will certainly be beneficial towards doing Data Science in the future. That being said, it can be hard to gauge what in the ecosystem you should adopt first, and how to transition or learn initially — and this article was intended to help those who want to get started but aren’t exactly sure where to start. I hope you enjoyed reading as much as I enjoyed writing! Have a great day!