Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did a vanity search for my own (modest) academic output and found only one paper, which was published by a journal in Europe. The other papers were all published in either Japan or Korea and don’t appear in your search results.

Two large sources in Japan you might consider trying to mirror are the UTokyo Repository [1] and Researchmap [2], through which many researchers in Japan release PDFs of their own papers. Other Japanese universities probably have archives similar to [1].

If you would like me to contact somebody at [1] who might be able to work with you, please let me know in a reply to this comment. (I helped to arrange the IA’s recent tie-up with the University of Tokyo General Library.)

[1] https://repository.dl.itc.u-tokyo.ac.jp/?lang=english

[2] https://researchmap.jp/?lang=en



Ah, sorry to hear. We in particular want to include content from outside the US/Europe publishing world.

For Japanese publishing, we have done metadata imports from JaLC (Japanese DOI registrar), and crawled a lot of open content from J-Stage (https://www.jstage.jst.go.jp/) and I hoped that coverage was pretty good. If you get a chance, could you try searching for metadata records on https://fatcat.wiki, with both Japanese and English titles and names (if applicable)?

For Korean publishing, the regional DOI registrar (https://www.kisti.re.kr/eng/) does not provide open metadata, which is a known hole in our coverage. IIRC it looked like there might be a way to scrape at least DOIs, titles, and author names, but haven't had time to take a crack at it.

Mainland Chinese publishing is probably the biggest single hole in coverage by absolute numbers. There are two DOI registrars and neither have open metadata.

Regarding the u-tokyo.ac.jp, it looks like we are able to consume metadata and do crawls via the OAI-PMH protocol. We crawled over 112k URLs from that domain via that protocol about a year ago, and they should be preserved/mirrored in web.archive.org but they haven't ended up in fatcat or scholar yet. We want to go slow with pulling in OAI-PMH content, and ensure we de-duplicate records and add filters to ensure we are getting clean metadata and content. Also preserving repository content hasn't been as urgent as getting to small OA publishers which might lack a preservation scheme and vanish off the web.


Many thanks for the reply. I will contact some colleagues at our university library to ask for suggestions about how to check systematically how comprehensively J-Stage and Fatcat cover research publications in Japan. I will also ask if they have any suggestions about other sources from which the IA might gather such data from Japan. Either I or they will contact you by e-mail.

My subsequent vanity searches at J-Stage and Fatcat weren’t very encouraging. Most of my own papers have appeared in journals published by Japanese university departments or academic societies. While PDFs of the papers appear on the websites of the issuing organizations and show up on Google Scholar, they don’t seem to have DOIs or be listed on J-Stage.

I should mention that my research has mostly been on the humanities side of things, while J-Stage is “an electronic journal platform for science and technology information in Japan” [1].

[1] https://www.jstage.jst.go.jp/static/pages/JstageOverview/-ch...


This is great feedback, thank you.

For future follow-up, my work email is my handle here (bnewbold) at archive.org




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: