STUMP » Articles » Happy Second Christmas: Epiphany, Data Visualization, and Sumo! » 6 January 2022, 06:34

Where Stu & MP spout off about everything.

Happy Second Christmas: Epiphany, Data Visualization, and Sumo!  


6 January 2022, 06:34

We have finally come to the end of the Twelve Days of Christmas, to January 6 and Epiphany, aka Little Christmas or Second Christmas:

So for the day, I’m combining two of my interests — data visualization and sumo!

Talking data visualization with Harsh Jaitak

I had a conversation with Harsh Jaitak a while ago on data visualization and the actuarial profession, and he posted the video recently:

It wasn’t just about data visualization, but about communicating to an audience in general.

Data visualization is a powerful way to communicate, because human brains have a huge part dedicated to analyzing visual information. The visual cortex takes up a lot of real estate in the cortex in general.

Jaitak Harsh also wants to promote this article at the Casualty Actuarial Society on Voices of Actuaries in Community, and this is his YouTube channel, TBD Actuarial.

Waiting for some dataviz gifts plus recommendations

I’ve been building up my data visualization library, and writing a few things (and doing a few videos).

I’ve signed up for the paper version of the Nightingale journal, the journal of the Data Visualization Society, so waiting for that first issue. In addition, I backed a kickstarter of a set of books on dataviz pioneers. Good stuff takes time to produce.

I already gave some dataviz gifts in the Excel post, so for today’s post I have a couple of book recommendations, plus links to articles I did on DataViz.

I started that series back in 2016. While the tech changes (and, obviously, some of my old links won’t work), the essential principles don’t.

Two books that I’ve found particularly helpful in terms of thinking through the principles are Jonathan Schwabish’s Better Data Visualizations and cole knaflic’s Storytelling with Data

FWIW, I mainly use Excel for my data visualization, for a variety of reasons, and most of the best dataviz resources aren’t system-specific. Just a thought.

Sumo! Tournament! Starts on Sunday!

Now I’m going to shift topics, but there will be a link, I promise.

This picture came from David Christin in one of the sumo fan groups on Facebook:

Obviously, that’s Merry Christmas in French (sumo is an international fave!), but it looks like Santa is doing the dohyo-iri ceremony of a yokozuna there, or the ring-entering ceremony of the sumo grand champions. Because of course Santa would be a Yokozuna.

Here’s a video of such a ceremony, with Hakuho, back in 2019. He just retired this year.

Santa’s outfit is a little different from the traditional Yokozuna, but he’s the old man and can do things a little differently. Also, the rikishi don’t have beards. Or wear hats in the ring.

But, Santa.

Join me in watching the January sumo tournament starting this coming Sunday — whether on NHK World’s Grand Sumo coverage, Jason’s All-Sumo Channel, or Natto Sumo.

Meep loves sumo!

That’s from the pre-pandemic times, and there are aspects of sumo from pre-pandemic times that have yet to come back (I miss the fans throwing their cushions, which was always a fun thing to see…. though I know the old fuddy-duddies kept trying to prevent people from throwing cushions. I thought it was a nice touch.)

But wait! I have a data visualization and sumo connection!

Sumo graphs

First, I want to thank Fred Pinkerton for his sumo statistics resources.

You can see them at his website,, and the visualization most sumo fans check out is his ranking tracker (the ranking in sumo is called the banzuke):

Sumo is fairly simple to follow, in that it’s man-against-man; each match has a winner and a loser; and at the top division, it runs for 15 days, with each wrestler needing to win 8 matches to have a winning record for that tournament (marked in green in the graphic above).

There are complexities in how the sumo association makes the match-ups, as one generally knows the opponents for only the first day or two at the beginning of the tournament and it adjusts as the tournament progresses. They want to make sure that it’s not a cinch to win for any one wrestler, not at least until the last few days. They want people to remain interested! But it’s not an elimination tournament, either.

In any case, this reminds me of when my grandma asked me to help her put together bridge fours and a different friend asked me to plan out a golf season for his country club, so that they would be up against a variety of players and wouldn’t get unbalanced. It’s a complicated problem! Heck, supply chains are simpler than this — supply chains have fewer constraints than trying to make sure people have a good time.

This graph Fred put together intrigued me:

This is weight vs. height, and some other information, including tournament wins (which is extremely difficult, especially considering how much the prior yokozuna Hakuho dominated the competitions) and ranking.

In the January 2022 tournament, we have 42 wrestlers (rikishi) at the top division (Makuuchi). I think a few may have bowed out, which is pretty usual (especially as Covid infections run rampant).

First, in taking Fred’s weight and height info, I did a histogram of the wrestler BMI, just for amusement:

Yes, obesity generally is defined as BMI starting at 30, and morbid obesity at 40.

There are only 4 rikishi at the top level under 40: Wakatakakage, Wakamotoharu [yes, these guys are brothers… and there’s a third brother as well, Wakatakamoto], Ishiura, and Hoshoryu. They’re all very close to 40 in BMI. They’re also very muscular. Yes, I know BMI is a bad measure for top, muscular athletes.

These smaller wrestlers are very muscular. But the bigger guys tend to be muscular and fat.

Informal cluster analysis of sumo wrestlers

I wanted to try cluster analysis on the wrestlers, so I could classify them into broad physical groups.

Here is my first cut. I did do k-means cluster analysis, using a variety of metrics, but was unhappy with the results. So here are my very unscientific labelings of the weight-height space, eyeballing the clusters I prefer.

Unlike Fred, I did not mark out the axes in English units.

But the smallest wrestlers are about 5’7” and 260 pounds (the two I labeled “mighty mice” — they are very muscular guys, not a lot of fat), and the largest guys – Kaisei, at about 6’4” and 430 pounds and Ichinojo at 6’3” and 450 pounds (MAN MOUNTAIN) — yes, you can get these guys in matches against each other.

There are no weight/size classes in sumo. Just skill/achievement classes.

Most of the guys are clustered around a size of about 6’ tall and 350 pounds. But some are much smaller, and some much larger.

“String beans” are the guys who are pretty tall, but not as hefty. Hoshoryu is the exemplar here.

I marked off “tricky monkeys”, and that’s mainly a conflation of Tobizaru (whose name means “flying monkey”) and Ura, who sports a pink belt. Ura has been really cheeky lately, trying all sorts of tricks. Tobizaru, Ura, and Kotoeko are all fairly small guys, and thus must try techniques different from the really tall or the really hefty guys to win.

To stay in the top ranks, you do have to win often enough, and these three do generally have to have a bag of tricks to win against men much larger than they in mass and height.

Terunofuji, the current sole Yokozuna (top rank), is about 6’4” and 400 pounds, and generally, Yokozuna are big guys. They also have a variety of techniques, because simply being big isn’t good enough.

One of the top-ranked wrestlers, Takakeisho, is pretty short at about 5’9” though hefty at 350 pounds (currently). He’s in my “battle hamster” group. Think “Weebles wobble but they don’t fall down” for the shorter guys who have a lot of bulk. They’re harder to throw, and if they can stay over their feet, they often win by shoving the other guy out of the ring. It’s applied physics! (One of my undergrad degrees is in physics, btw)

The last tournament, the guys that are close to the line had long battles. Takayasu in particular had to struggle for long periods until win or loss.

I may try to amass more info, such as winning records, kimarite (winning moves) used, etc. and see if I can get better classifications than just size.

Another person, Ben Marshman, has been compiling henka stats. (A henka is a controversial move at the beginning of a match, where the wrestler dodges an initial impact, called the tachiai.)

I tried analysis on what he compiled for 2021, but was not able to show anything interesting. It wasn’t just the small guys who tried using henka as a technique, though Terutsuyoshi, the smallest of the top-level rikishi, had the highest henka frequency of all of them.

D and I will sing you out of the season

And, here are my son D and I to sing you out of the season:

In case it’s not clear, D is trying to make the shapes of the numerals with his hands, but does not realize that they’re not going to look the same to him as they will to the camera.

So that’s it for the Christmas season, so soon enough it will be back to the mortality and public finance grind.