KakkoKari (仮) Another (data) science blog. By Alessandro Morita

How Cartesian can we make coordinates on Earth?

Earth is locally flat. This is true not only for the (surface of) the Earth, but for any so-called Riemannian manifold, a generalization of surfaces to any number of dimensions. Even though there is curvature, if one zooms enough into a point, curvature disappears and their neighborhood will look flat. This is why some individuals in our planet... Read more

My favorite math problem

Back in the second year of high school, a friend shared with me a problem that his geometry teacher had shown him. I was going through a small crisis regarding my future career. I couldn’t decide whether I wanted to pursue a major in the Humanities (Arts or Design were on the top of the list) or in STEM. Before eventually settling down on Physi... Read more

AUC as Loss: directly fitting to optimize ROC AUC

A common application of binary classification models is ranking, more than classification itself. The difference between the two is subtle: In classification, you want to say how likely a point is to belong to class 1 or class 0; In ranking, you care whether point A, who is in class 1, is more likely than another point B, in class 0, to be... Read more

Dribbling through Jensen's inequality

I recently published a short discussion (in Portuguese) on LinkedIn about how Jensen’s inequality complicates the process of building regressions for transformations of an original variable. More specifically, we discussed how \[\boxed{\exp(\mathbb E[\log(Y)] \leq \mathbb E[Y]}\] This is due to $x\mapsto \log x$ being concave and both it and i... Read more

Variance of the ROC AUC: a full derivation

The ROC AUC is the most used statistic to assess the predictive power of a classification model. However, few working data scientists know theoretical results about its statistical fluctuations. Here, we show in detail a derivation of a commonly found result on the variance of the ROC AUC. We have not found this demonstration done in length in ... Read more