GPU retrieval can rival HNSW/IVFPQ on both latency and recall. LinkedIn's LiNR (Feed OON recsys) and SJS and Meta's MoL already deploy exhaustive k-NN at production scale on A100/H100 nodes. See my full post for the technical details
This is our team’s work on LLM productionization for RecSys from a year ago. Since September 2024, it has powered the most member experience in job recommendations and search at Linkedin. A strong example of thoughtful ML system design, it may be particularly relevant for ML/AI practitioners.
Yep, there is a lot of cool stuff on formalizing maths rigorously, called informally 'computer mathematics'. We give a brief overview of this work in our papers.
Moreover, for the interested reader, I would suggest paying close attention to univalent foundations of mathematics (http://www.math.ias.edu/~vladimir/Site3/Univalent_Foundation...) introduced by Vladimir Voevodsky. It disrupts classical derivation of math results rooted in Kantor's set theory and logics, and provides a theoretical framework that is much more convenient for computerizing.
Unlike them, we follow the different, less theoretical and more pragmatic, approach.
About tools, in case you've missed it on the website, we suggest using Protege as a visualization tool and an ontology editor.
Indeed, we suffer from the absence of a web UI that visualizes the ontology neatly. But this is due to the lack of such tools for any large RDF graphs.
For now, we measure coverage on our test collection of scholarly papers from our university's journal, and it is sufficient only for it. However, there are concepts from the following fields of mathematics: geometry (pretty advanced, believe me), analysis, number theory, set theory, algebra, logics, discrete mathematics, differential equations, numerical analysis, theory of computation, probability theory and statistics.
Concerning scope, we focus on professional-level mathematics (Pro suffix emphasizes that). It is ready to use, and we describe a formula search mashup (https://github.com/CLLKazan/MathSearch) built atop of the ontology. On GitHub, you may find references to a published RDF dataset, which was extracted automatically from a test collection of our university's math journal.
However, I should warn that the ontology is a mature, but ongoing work (and it is meant to be, frankly speaking, since we follow the crowdsourcing methodology), and the quality of labels (especially, English) or coverage of fields of mathematics should be constantly improved.
All right, I think we shall answer all these fair questions on the website. For motivation, please feel free to peruse our research papers that are publicly available from the website. As we note, [1-2] addresses to the Semantic Web community, [3] explains in terms that are likely close to mathematicians.
Hi, nice to here. I'm a co-author. Let me clarify some of your points.
1) Indeed, we consider visualization of graph dependencies in OntoMathPro as an important application for learning. Given sufficient coverage of relationships between concepts, it can provide a helpful context for any non-trivial term.
2) No, the ontology was constructed collaboratively and manually from scratch, and Wikipedia was just one of the used resources. BTW, overlapping between the math part of Wikipedia (or DBpedia as we think in terms of Linked Data) and OntoMathPro is saved in the mapping file (https://github.com/CLLKazan/OntoMathPro/blob/master/external...), which was extracted automatically afterwards.
3) Concerning "ElementS of Probability Theory", could you please provide class URI you are talking about? Because I can see only this relevant one: E2406 http://ontomathpro.org/ontology/E2406___599545262.html which has the proper name (without 's').
4) We do allow and have multiple inheritance. Please see E1892 Differential Equation, which is a sub-class of both E1891 Equation and E2688 Element of Differential Equations. I believe there are more subtle examples in the ontology (can't remember exactly for now).
5) About ontology engineering principles, if you are interested in, please peruse our research papers (especially, [2]), in which we elaborate our modeling principles.
6) I can't agree about 'weakness' of the chosen language. OWL 2 is quite expressive to provide non-trivial logical rules and properties. For examples, some of them are already in place: P5 'see also' property is transitive and symmetric. Surely, we can't describe the precise semantics of mathematics (we would have to have a more expressive language than mathematics itself according to Popper's methodology). But we don't need it to build fascinating applications atop of the existing ontology, as our work hopefully shows.
3) My bad, I don't know why I read it with an S... strange.
6) No, OWL is very powerful, my point is about the choice of vocabulary... For instance, a markov chain is not an "element of probability theory" in the same way as it is a "probabilistic model" or a "stochastic process".
I would say this is by no means a flaw of Linked Data (the Semantic Web approach). There is an essential duality in many domains, including mathematics. For example, we can adhere different approaches to describe the math theory as a whole from the following points of view:
1) Classical (N. Burbaki's approach): Kantor's set theory and logic
2) Constructive: where we are standing on constructive (intuitionistic) logic
3) Univalent foundations of mathematics (a novel approach).
Even if we stick to the 1st approach only (as we did for the ontology), there are also many dualities (alternative definitions), if we apply, say, terminology from geometry or, alternatively, from set theory while describing the same math objects.
Anyway, I think the methodology, we are working on during this project, should clarify many such hidden aspects. And we expect that it will be valuable for the modern math theory itself. So, let's collaborate:)
To be precise, AM is related to so-called 'computer mathematics'. In this project, we do not aim at modelling mathematics for theorem provers or similar things. Our applications (e.g. keyword-based formula search, education, information extraction) are introduced in the papers.