Hacker News

The peculiar case of Japanese web design (2022)

2026-02-2314:28267119sabrinas.space

This project can be broken down into three parts: gathering data, processing data, and analyzing data. gathering data I started by using SEM Rush’s Open.Trends service to find the top websites for…

Show article

This project can be broken down into three parts: gathering data, processing data, and analyzing data.

gathering data

I started by using SEM Rush’s Open.Trends service to find the top websites for each country across all industries. While this can be done manually, i automated the process using the Python libraries BeautifulSoup and Selenium-Python (you can also use the Requests library in this case, but I already had Selenium imported lol). Here’s some pseudo-code to give you an idea of how it was done:

# run a function to get the list of countries Open.Trends has listed on their site
countries = getCountries()

# initialize a dictionary to store the information
d = {
    'country':[],
    'website':[],
    'visits':[]
}

# iterate through that list
for country in countries:
  # follow semrush's URL formatting and plug in the country using a formatted string
  url = f'https://www.semrush.com/trending-websites/{country}/all'

  # navigate to the URL using Selenium Webdriver
  driver.get(url)

  # feed the page information into BeautifulSoup
  soup = BeautifulSoup(driver.page_source, 'html.parser')

  # extract the table data using BeautifulSoup
  results = getTableData(soup)

  # feed the results into the dictionary
  d['country'] = results['country']
  d['website'] = results['website']
  d['visits'] = results['visits']

# save this into some sort of file
df = pandas.DataFrame(d)
df.save_csv('popular_websites.csv', index=False)

NOTE: the quality of this data is subject to the accuracy of SEM rush’s methods. i didn’t really look too deeply into that because their listings were comparable to similar services.

You should now have a dictionary of the most popular websites in each country. A lot of those websites will be porn or malware or both. Let’s try to filter some of those out using the Cyren URL Lookup API. This is a service that uses “machine learning, heuristics, and human analysis” to categorize websites.

Here’s more pseudocode:

# iterate through all the websites we found
for i in range(len(df['website'])):
  # select the website
  url = df.loc[i,'website']
  # call the API on the website
  category = getCategory(url)
  # save the results 
  df.loc[i,'category'] = category

# filter out all the undesireable categories 
undesireable = [...]
df = df.loc[df['category'] in undesireable]

# save this dataframe to avoid needing to do this all over again
df.save_csv('popular_websites_filtered.csv', index=False)

NOTE: Cyren URL Lookup API has 1,000 free queries per month per user.

COMPLETELY SEPARATE NOTE: You can use services like temp-mail to create temporary email addresses.

Now it’s time to get some screenshots of the websites! If you want to take fullpage screenshots, you will need to use Selenium-Python’s Firefox webdriver. If not, any webdriver is fine. However, you probably don’t want to use full page screenshots as webpage sizes vary a lot and this can make your final results less interpretable.

def acceptCookies(...):
  # this function will probably consistent of a bunch of try-exception blocks
  # in search of a button that says accept/agree/allow cookies in every language
  # ngl i gave up like 1/3 of the way through 

def notBot(...):
  # some websites will present a captcha before giving you access
  # there are ways to beat that captcha
  # i didn't even try but you should

# iterate through websites
for i in range(len(df['website'])):
  url = df.loc[i,'website]
  driver.get(url)

  # wait for the page to load
  # you shouldn't really use static sleep calls but i did
  sleep(5)
  notBot(driver)
  sleep(2)
  acceptCoookies(driver)
  sleep(2)

  # take screenshots
  driver.save_screenshot(f'homepage_{country.upper()}_{url}.png')

  # this call only exists for firefox webdrivers
  driver.save_full_page_screenshot(f'homepage_{country.upper()}_{url}.png')

NOTE: When doing this, you can use a VPN to navigate to the appropriate country / region to get increase the likelihood of seeing the local web page.

processing data

i mostly followed this tutorial by Grigory Serebryakov on LearnOpenCV. It utilizes an implementation of a ResNet model to extract the features of an image. You can pull the code from his blog post but we do need to load our images in differently. We can use this method by andrewjong (source). We need to save the image file paths for use in our final visualization.

class ImageFolderWithPaths(datasets.ImageFolder):
    """Custom dataset that includes image file paths. Extends
    torchvision.datasets.ImageFolder
    """

    # override the __getitem__ method. this is the method that dataloader calls
    def __getitem__(self, index):
        # this is what ImageFolder normally returns 
        original_tuple = super(ImageFolderWithPaths, self).__getitem__(index)
        # the image file path
        path = self.imgs[index][0]
        # make a new tuple that includes original and the path
        tuple_with_path = (original_tuple + (path,))
        return tuple_with_path

now we can load our images using that method.

# identify the path containing all your images
# if you want them to be labeled by country, you will need to sort them into folders

root_path = '...'

# transform the data so they are identical shapes
transform = transforms.Compose([transforms.Resize((255, 255)),
                                 transforms.ToTensor()])
dataset = ImageFolderWithPaths(root, transform=transform)

# load the data
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)

next we initialize and run our model. I needed to adapt Serebryakov’s code slightly to account for how our images were loaded.

# initialize model
model = ResNet101(pretrained=True)
model.eval()
model.to(device)

# initialize variables to store results
features = None
labels = []
image_paths = []

# run the model
for batch in tqdm(dataloader, desc='Running the model inference'):

  images = batch[0].to('cpu')
  labels += batch[1]
  image_paths += batch[2]

  output = model.forward(images)
  # convert from tensor to numpy array
  current_features = output.detach().numpy()

  if features is not None:
      features = np.concatenate((features, current_features))
  else:
      features = current_features

# return labels too their string interpretations
labels = [dataset.classes[e] for e in labels]

# save the data
np.save('images.npy', images)
np.save('features.npy', features)
with open('labels.pkl', 'wb') as f:
  pickle.dump(labels, f)
with open('image_paths.pkl', 'wb') as f:
  pickle.dump(image_paths, f)

we should now have 4 sets of data containing our image paths, labels, images, and their extracted features.

analyzing data

we start by running our data through sci-kit’s tsne implementation. This basically reduces our multidimensional feature arrays down to 2D co-ordinates that we can put on a graph. We can map smaller versions of our screenshots onto those coordinates to see how the machine has organized our websites.

# the s in t-SNE stands for stochastic (random) 
# let's set a seed for reproducible results
seed = 10
random.seed(seed)
torch.manual_seed(seed)
np.random.seed(seed)

# run tsne
n_components = 2
tsne = TSNE(n_components)
tsne_result = tsne.fit_transform(features)

# scale and move the coordinates so they fit [0; 1] range
tx = scale_to_01_range(tsne_result[:,0])
ty = scale_to_01_range(tsne_result[:,1)

# plot the images
for image_path, image, x, y in zip(image_paths, images, tx, ty):
  # read the image
  image = cv2.imread(image_path)

  # resize the image
  image = cv2.resize(image, (150,100))

  # compute the dimensions of the image based on its tsne co-ordinates
  tlx, tly, brx, bry = compute_plot_coordinates(image, x, y)

  # put the image to its t-SNE coordinates using numpy sub-array indices
  tsne_plot[tl_y:br_y, tl_x:br_x, :] = image

cv2.imshow('t-SNE', tsne_plot)
cv2.waitKey()

now we can look for any visual patterns in the images. What i found was detailed in the sections above.

i wanted to understand this data through the lens of writing systems, culture (geographically and economically), and technology. So, I found datasets containing that information: writing systems, iso countries with regional codes, and the global north-south divide. They needed to be supplemented with some additional Google searching to make sure we had labels for each country in our dataset.

Here’s a basic walkthrough of how I used this new analysis data.

analysis_data = # import data

# initialize a list to capture a parallel set of labels
# so instead of the country, we can label our data through writing system, etc.
new_labels = []

# iterate through our pre-existing labels and use it to inform our new_labels
for label in labels:
  # select the new_label based on the old label (the country name)
  new_label = analysis_data['country' == label]
  new_labels.append(new_label)

# use the new_labels to colour a scatterplot with our tsne_results
tsne_df = pd.DataFrame({'tsne_1': tx, 'tsne_2': ty, 'label': new_labels})
sns.scatterplot(x='tsne_1', y='tsne_2', data=tsne_df, hue='label')

NOTE: The technology argument used a more qualitative methood and is not included here.

we can see the results of those comparisons in the sections above

tsne implementation comparing poopular websites in japan and the usa.

an answer in progress project

Read the original article

montenegrohugo

Karma: 2381

@Hacker__News
@hacker._news

Comments

By usui 2026-02-2315:4812 reply

I read this piece when it came out in 2022. Maybe it should be marked with "(2022)". Previous discussion https://news.ycombinator.com/item?id=33745146

I just want to add that in addition to peculiar web design, Japanese websites have a way of assuming architectures or usage patterns where servers need to sleep or do some kind of scheduled job, which is really weird for people used to sites that need to account for a range of timezones or 24/7 availability (unless there is a pre-announced downtime that exists as a one-off thing). I know at least three websites off the top of my head that go down for "maintenance" at an exact scheduled time for hours every day, assuming that users would never want to access them overseas during those times (actually, one of those three doesn't even announce the reason, it just returns "server failed to respond" errors until it's time to "open up" for business again). Many services work fine, but at least a quarter to a half of Japanese web services are awful even though they eventually work if you can strangle yourself into making it work. The floor for Japanese web services is way below the floor for American ones. Those sites can get really mindnumbingly bad both on the front end and back end. I'm not sure what the cause is, but it must be a variety of factors. If tech-savvy users can't even make it work, I feel really bad for the struggling elders forced to use those sites.

By Multicomp 2026-02-2323:083 reply

I forget if it was Samsung or Sony, but somewhere along the way on my internet journey, someone claimed, without evidence, and thus I have none either, that the incentive structure for having prestige jobs at large technology companies was always in hardware design and software was seen as easier and more low class.

So since nobody will get any promotions for running good software, they are not incentivized to run good software, and therefore they do just enough to get by?

By shiroiuma 2026-02-242:231 reply

This is historically the reason software engineering in Japan has lagged and there's such a talent shortage (leading companies like mine to hire mostly foreign software engineers). I've heard it's changing, but it'll take a long time to catch up.

By seanmcdirmid 2026-02-242:26

When I was working for Microsoft China, many of our foreign engineers were Korean and Japanese, who were in China for the higher paychecks.

By usui 2026-02-241:46

Yes this is true and it might possibly be true for the rest of East Asia though I'm not sure. Software is considered intangible and thus low value that anyone can do, whereas hardware is a real "thing" that you can hold in your hands, and is therefore more prestigious. Well, this way of thinking has made things into the current state.

By ghosty141 2026-02-2323:32

This was and partly is the attitude you can find in german non-software businesses where software is gaining more and more influenxe. For example car manufacturing.

By Jn2G3Np8 2026-02-2316:154 reply

I found this out when buying a Japan Rail Pass for a trip a few years ago, blew my mind.

https://www.japanrailpass-reservation.net/ only works 4:00–23:30 Japan time.

By fsh 2026-02-2316:56

This is especially funny since the JR Pass cannot be purchased by residents of Japan.

By usui 2026-02-2316:231 reply

Yeah this is probably downstream of the fact that if you visit any of the individual JR sites from the expandable map at the bottom, you'll discover they're all down at this time as well. Let's scrap the website and make a staffed phone line or fax machine with operating hours.

By fsh 2026-02-2317:031 reply

Considering the state of japanese IT, there is probably a person typing each reservation from the website into a 1980s mainframe.

By z2 2026-02-2319:30

After receiving the orders that were actually printed from an Internet Explorer 6 only website, and faxed over from another office before being re-scanned in along with a barcode that usually failed to make it over the fax, hence the need to hand-type things. True story (not for JR specifically, but circa 2013)

By cedws 2026-02-2320:53

I've also had issues topping up my (virtual) Suica card late at night before.

By windows2020 2026-02-241:49

Maybe that's when they run all those crazy legacy jobs, but they politely shut the site down for it.

By WD-42 2026-02-2316:233 reply

Anyone who has attempted to play Final Fantasy XIV beyond the free trial has experienced this. Their subscription management web app is so incredibly bad it takes a significant amount of time and effort just to purchase a subscription. I wonder how much revenue they lose simply from people giving up.

By BariumBlue 2026-02-2318:262 reply

I was bored and tried playing FF14 about a year ago. You need to do the usual download a launcher to download the game, fine. It asks you to log in before it'll download, fine. It crashes ~10% of the way through downloading the game. Not great but you can make it by restarting the launcher and trying again. And again and again, about a dozen times. It does eventually finish though, and I did almost successfully make a character. Except after making my character you have to choose a server instance - and every single instance in the NA server I could find was "full". I don't know if it was actually full or erroring but I gave up at that point.

The buttonology is cryptic. Like you asked tasked enterprise java devs to write frontend in jquery.

At least that's how I remember it. Game might be fun, but I'll never know.

By WD-42 2026-02-2318:45

So you didn’t even get to the final boss, purchasing a sub.

While I played it I always had this dirty feeling imagining what the backend code must look like. Sends chills down my spine.

By abustamam 2026-02-2319:13

I played on my Playstation when I played a few years back, fortunately it was a seamless process! As parent comment said though, subscription process was almost user hostile for some reason.

By abustamam 2026-02-2319:12

I was wondering why the process was so convoluted. I thought it was because I was doing it from my phone and they just had a poor mobile site. Well, apparently they have a poor desktop site that has poor mobile support!

By bigstrat2003 2026-02-267:21

Let me tell you, as bad as the FF14 subscription process is, it's nothing compared to what they had for FF11 back in the day. We have it good!

By vimda 2026-02-2321:16

A lot of Japanese websites also have to be tremendously over provisioned because of how regimented the country is. A friend of mine worked infrastructure for a local newspaper, and every day at 6PM they'd send a push notification to all their subscribers and had to provision for that peak. When he asked if they could smooth out traffic, send the notification to some folks a minute before, or a minute after he was almost thrown out of the room. "Japan runs on time. Not a minute early, not a minute late. On time".

By multjoy 2026-02-2320:041 reply

The UK driving licence authority (DVLA) also has a period in which you can’t conduct a range of transactions overnight, but that’s because it interfaces with systems that still run batch jobs overnight and the cost of making it all 24/7 simply wasn’t worth it considering the demand.

By pixl97 2026-02-2320:17

Really having common maintenance windows makes things way easier. If you already have a service with a limited geographical range its not bad.

By abustamam 2026-02-2315:50

A pet peeve of mine — undated blogs :(

By bandrami 2026-02-2317:242 reply

The US Social Security Administration website is available from 6am to 8pm, Monday to Friday (or at least it was that way a few years ago)

By shakna 2026-02-2318:36

The service hours seem a bit wider nowadays [0], but not 24/7.

[0] https://www.ssa.gov/myssa-static/rel_1.0/offHoursPopup.html

By mananaysiempre 2026-02-240:511 reply

I’ve heard such things in the US were because of accessibility law that required the website (for the general population) to work no better than the associated call center (for the people who can’t interact with the website for whatever reason).

On one hand, that seems obviously stupid. On the other, I don’t see how you could phrase a legal requirement of this nature.

By bandrami 2026-02-249:45

That's better than my assumption, which was that it was running off the Visual Foxpro instance on somebody's desktop and that guy had to be logged in for it to work.

By bryanrasmussen 2026-02-2322:52

this is also relatively common in Denmark, at least for government sites. One common thing you see (saw, haven't noticed in the last couple years) in Danish .gov sites is queuing where you need to wait some time before you are allowed in to use a site.

By corranh 2026-02-245:241 reply

Getting ready for a trip to Japan, I spent an embarrassing amount of time troubleshooting failures to load a Suica (train/transit) NFC card on a phone before realizing it just doesn’t work a few hours a night Tokyo time.

By art0rz 2026-02-2413:03

The Suica app doesn't even work on my Pixel 10 Pro, since it requires an Android phone with some sort of Japan-specific hardware (FeliCa/Osaifu-Keitai technology, whatever that is, I'm assuming some special NFC or secure enclave sort of thing).

By socalgal2 2026-02-240:041 reply

One of the worst sites in existence is the Japanese Visa site they direct people to to make QR codes for when you land in Japan as a tourist. It's atrocious.

https://services.digital.go.jp/en/visit-japan-web/

I hate it so much I kind of wish I could volunteer to fix it. I suspect the process though would be torture

Note: experience on mobile is bad. I don't remember if desktop is better.

By pezezin 2026-02-241:32

I live in Japan and every time I go through the airport I refuse to use the QR code customs forms, the old paper based form is so much easier...

By ezoe 2026-02-2322:23

Probably the old habit of batch processing.

By nekooooo 2026-02-2317:53

if you're talking about the train booking site going down -- struggling elders are still using the face to face or phone support. they probably have never made an online reservation.

By iamnothere 2026-02-2315:128 reply

I prefer the Japanese style. Information dense, yet clean. It reminds me of the web before Apple-style minimalism took over.

To contrast with a superficially similar style, Chinese web stores are also maximalist, but they tend to assault you with popup coupons, confetti effects, and other such things. Japanese style feels very efficient and utilitarian by comparison.

By BitwiseFool 2026-02-2318:261 reply

>"It reminds me of the web before Apple-style minimalism took over."

The loss of color and texture is my biggest gripe. So many webpages and user interfaces abandoned the idea of distinguishing components using different colors and just went with making the page as close to bleach white as possible. I suppose an upside of this is that it made dark-mode easier to adopt. That being said, good dark mode support seems relatively recent.

By a456463 2026-02-2322:10

And now all AI slop coded by anyone is that. Tell tale signs: AI likes to make cards, implement SVGs by hand, all cards have a left hihghlight border, off center font spacing, badges and notification icons, etc.

By kccqzy 2026-02-2315:49

I think you made a good observation about what’s in essence different between the Chinese style and the Japanese style. The popup coupons and confetti effects are all animations. Personally I find these animations highly distracting. Whereas if something is information dense but static, I like it.

(There are also non-store Chinese designs; they are not trying to sell anything so they don’t need coupons and confettis. These are actually enjoyable to use. And they are more information dense than the English equivalent because the Chinese script packs more in a smaller space. This of course makes such designs i18n-hostile.)

By mc32 2026-02-2315:283 reply

It reminds me of the “portal” era of Netscape, Excite and Yahoo. Very information dense. Among others’, Google’s minimalism took over.

By iamnothere 2026-02-2315:33

There are still a few information dense English language sites out there, but they’re rarer. Honorable mentions:

- https://based.cooking/ (or the more updated fork https://publicdomainrecipes.com/)

- https://ooh.directory/

- https://gwern.net/

- https://www.metafilter.com/

- HN :)

(These are primarily text and lack the occasional color pop of the Japanese style, but I still admire the density and efficiency.)

By deltoidmaximus 2026-02-2317:59

I felt like part of Google's success was that the simple search bar loaded fast in an era where I often had slow internet. Yahoo's portal page had to much on it to distract or slow me down from doing what I came there to do.

Later on I remember finding out Yahoo had a search.yahoo.com page or something that was also just a search bar but that was harder to type so was still a failure of design.

This was before combined search and address bar.

By awad 2026-02-2320:54

It would not surprise me that Yahoo Japan was the blueprint for many of these sites. It still is extremely popular as a portal destination.

By xattt 2026-02-2316:05

They feel like paper catalogues!

By torgoguys 2026-02-2316:15

Yes, this was the portal style and I still adore it and use it myself, where I can. As long as the page has a scannable information hierarchy, information dense sites are better when you just want to get stuff done (/look stuff up), which for me is most of the time. I don't care about the fluff and "hero images" and the rest.

By GaryBluto 2026-02-2321:30

> Apple-style minimalism took over.

To be fair, it was Microsoft-style minimalism that Jony Ive brought to Apple, who then popularized it.

By pezezin 2026-02-241:301 reply

Do you actually use Japanese websites on frequently? Because I do live in Japan, and I hate their websites with a passion. Go use any Japanese online shop; the purchase flows are usually absurdly convoluted, and they are so information dense that sometimes you don't know what you are actually going to purchase. It is one of the reasons I rarely use Rakuten anymore...

By shiroiuma 2026-02-242:37

Yeah, I hate to say it, but using Amazon.co.jp is SO refreshing after using a Japanese website. It's really unbelievable how bad most Japanese e-commerce sites are.

By jp1016 2026-02-2320:53

The technology argument is the most convincing one to me. I worked with a Japanese client a few years ago and the internal tools they used were wild by western standards. Like full-on frameset layouts in 2020. But it wasn't ignorance, it was continuity. The tools worked, people knew how to use them, and there was zero appetite for redesigning something that wasn't broken.

The font thing is also underrated as a factor. When you only have a handful of web-safe CJK fonts and you can't rely on weight/size variations to create hierarchy the way you can with Latin text, you compensate with color and density. It's a constraint that pushes you toward a specific aesthetic whether you want it or not.

I think the framing of "peculiar" is a bit western-centric though. Dense information-heavy pages are arguably more respectful of the user's time than the trend of spreading three sentences across five viewport-heights of whitespace.