I am an independent researcher currently funded by a part-time job and a very supportive spouse. I play with algorithms and have discovered a new fast Fourier transform and a neural network growth+training algorithm that doesn't use gradient descent. Ideally, some benefactor would pay me to publish my findings and code, for the good of public knowledge, but my attempts to find such funding have fallen flat. Kind of funny that researchers can get funding for future work (without guarantees) but I can't find funding for an interesting discovery that is already done (there are issues with verification of the work, but it seems like a short-term NDA would take care of that).
I have resorted to partnering with a law firm, who, for a large cut of any revenue, will do all the IP work and "marketing" (i.e. contacting legal departments at companies that might be interested in the algorithm). This is not ideal, but is so far the only path presented that may help me recoup wages lost by not working full-time for several years. I figure if I retain control of the IP (and make money through licensing), I can make sure scientists and researchers have free usage rights.
If the IP thing works, I can hopefully continue independent research. If it works well, I hope to self-fund more research without the IP shenanigans. Otherwise, it is back to full-time employment.
Publishing on arXiv is now very difficult. You need someone to endorse you who has an account on arXiv and has three recent publications in the area of research you're trying to submit.
I just went through this and had prominent researchers willing to endorse me but unable to do so because of the now stringent requirements on arXiv.
Certainly. It also helps a resume. And if that is the only value to the work, then so be it. But there is a reason why R&D departments and universities publish AND protect generated IP. They want returns on investment. In my case, I invested in myself. Why shouldn't I play this game?
Universities don't do research. Individuals affiliated with the university do. The usual deal is that the individuals own the results. If they want to commercialize the results, then the university (and possibly the funders) get a share. But the vast majority of academics don't do that, because commercialization attempts mean certain bureaucracy and uncertain gains. Most just want to talk about their results freely. Many even talk about preliminary work and preliminary results openly to make it less likely that the eventual results will be patentable.
6-7 figures. IDK if this is realistic, but it seems reasonable to me considering the cost and risk involved in reproducing this work from scratch (salary/benefits/overhead for a research group or individual; no guarantee of success), and the impact of the work (admittedly I am biased to think it's important). I have seen 6- to 7-figure gov't grants for projects (in a similar space) that I guarantee are lower impact.
Usually funding agencies do not fund people with no track record of publishing. If you haven't published your results (or even any prior work) publicly before, why do you expect companies to fund you?
I have a history of publishing in computational physics. But that's beside the point. I have an exciting new result in digital signal processing and I would like to get retroactively funded for the work. If researchers can get funded for future work, why not award past work that comes with a guaranteed outcome?
Payments for making past discoveries public create terrible incentives. They encourage you to keep your discoveries secret until someone is willing to pay enough for them. If your discovery is not particularly valuable on its own, but someone else could make a breakthrough building on it, that breakthrough won't happen, because the other person does not know about your result.
Patents, while also terrible, are a better model. They at least require you to disclose the invention first, and then it's up to you if you can take advantage of your temporary monopoly. While practical impact may be lagging due to reduced commercial interested during the monopoly, the result is at least public, and other people can build on it.
I would like to see solutions (for professionals) that ditch the whole generative part altogether. If it's so good at finding references or identifying relevant passages in large corpora, just show the references. As you said, the "answer" only entices laziness and injects uncertainty.
I think many proposed solutions to the creator compensation problem end up glossing over a fundamental difficulty: once an easily-distributed work (like anything digital) is in a consumable state (and thus copy-able), it becomes basically free.
The idea that $10 for a digital copy of an album that is already on youtube (or a friend's harddrive) should be a viable business model is weird to me in this day and age.
I have recently been wondering about a threshold-based "media economy" where creators don't actually show us anything (except for clips or samples or low-res versions, etc) until they are guaranteed a certain amount of income. It's basically kickstarter. A musician makes an album, goes on kickstarter and asks for $10,000 to release it. Once $10k is reached, the songs go up on a server, or are released on bandcamp, spotify, or any of the usual channels. Additional money beyond the threshold can be made, but it will be as difficult as it is now. But they have already reached $10k (set by them) so everyone can feel good that the musician has earned what they feel they deserve.
I'm sure there are many problems with this. For one, many artists aren't creating just for money. They want to show us their creations, and with a threshold, they would have to hold back until it is reached (in the case of musicians, they might not even be able to play a new song at a show until the threshold is reached, b/c smartphones).
There may be a critical mass problem, too. If two artists are similar and one releases immediately while the other waits for the threshold payment, the latter may drift into obscurity. There must be some allure to the withholding, though?
What other problems kill this approach?
Could it work for open source software, too? Make your thing, don't share it. Demo it, ask for the release payment, then put it on github.
>Could it work for open source software, too? Make your thing, don't share it. Demo it, ask for the release payment, then put it on github.
I think it would be far more reasonable to put the source into escrow, to be released when a threshold is met. I've seen closed source vendors do that when they're smaller to ensure a large customer is not left high and dry should they go bankrupt or be acquired by someone who kills the product.
I don't foresee anyone being willing to see a demo of a piece of software, then writing a check for it before using it. In the closed source world you pretty much ALWAYS have to do some sort of POV/POC before anyone will buy your stuff.
Artists (especially music) are currently navigating a very tight legal landscape where the works they are producing might get flagged as infringing even if they took every reasonable precaution. For example: sounds that sound similar to an existing sample, or note progressions that are fiercely defended by companies who use the residuals as their primary income source.
This can cause an issue if the artist releases a work and does not receive enough to defend themselves in court, especially as the work would now be very hard to take down and the artist may even have trouble stopping the income from coming in.
Source: I’ve been thinking about this “everything is released for free once the artist is paid” approach for a while and I think there are some notable wins esp. around the Patreon model where artists can know the eventual payout before they start making. I think it has amazing potential as most content becomes “free” anyway and it would make it so much easier for fans to share openly netting in more plays, more likes, and more fans.
> Source: I’ve been thinking about this “everything is released for free once the artist is paid” approach for a while...
You should start a "media label" then. Take a cut of the threshold payment for (1) vetting and reviews of unreleased art, (2) distribution costs of the digital media, and (3) legal assistance for artists against copyright trolls.
Plenty of creators make a decent living selling their content digitally. Once you democratize the tools and distribution, you remove the media companies that traditionally take the lions share of all the money. In the traditional setup a few business people and a few artists get rich and everyone else is broke. In an economy where the creator distributes directly via digital then a bunch of people get decent incomes. The second option is the better one, IMO. Once we do away with the notion that creating art could make you rich, then it become less necessary to make sure that we have some centralized way to collect money for art.
Agreed!
Direct distribution will completely re-shape the content landscape, I believe. Probably starting with the most dysfunctional industry of "producing" music. The intermediaries are borderline parasitic there.
We now have "Decentralised AI" working in the lab last month. So also the new music discovery, recommendation, fuzzy keyword search, spam filtering can be realised with full decentralisation (in principle). See live demo of our toy example [1]. Broad writeup [2]
I think I agree with you, but democratizing distribution is still orthogonal to the piracy problem. On the one hand, I'm more likely to pay an artist if the only official way I can get their art is to purchase from their website. On the other hand, the first digital download from an artist's website may go right to a torrenter, or youtube. Is self-distribution accompanied by the task of chasing youtube takedowns? Sounds not fun.
A pre-release payment directly addresses the issue of piracy. Piracy just doesn't exist if the content isn't out there.
No one is going to want to buy pre-releases of things they haven't experienced yet. We buy things we like and most people tend to be ambivalent about things they are ignorant of. When you democratize distribution as a side effect you end up with a saturated marked. If you went on youtube and had to find new things to watch, but were required to pick out things you think you would like based on a short preview and description, then wait days or weeks for it to release, I doubt you would visit it very often.
If we just accept the fact that piracy exists and that people are going to pirate and then ignore that aspect completely and carry on, I think you would be surprised how many people are willing to pay for things they want if the price is reasonable, regardless of whether they can get it for free via another method.
> I think many proposed solutions to the creator compensation problem end up glossing over a fundamental difficulty: once an easily-distributed work (like anything digital) is in a consumable state (and thus copy-able), it becomes basically free.
You've re-discovered the purpose of copyright laws.
Yeah, it might be easier with digital, but once Mickey Mouse gets drawn and becomes popular, drawing him again is super easy for the random artist who can say it is "theirs" and draft off of the millions of dollars Disney spent marketing. Hence the need for copyright.
I'm mostly thinking about creations that are already done and can somehow be vetted, either by demo, samples, trial version, or by a reputable reviewer that gets a sneak peek.
I look at this thing and can't help thinking "where will someone set down their coffee mug when their hands are full and they need to open the door?" There isn't a flat surface on it.
The basic idea is this. For a time-stretch factor of, say, 2x, the frequency spectrum of the stretched output at 2 sec should be the same as the frequency spectrum of the unstretched input at 1 sec. The naive algorithm therefore takes a short section of signal at 1s, translates it to 2s and adds it to the result. Unfortunately, this method generates all sorts of unwanted artifacts.
Imagine a pure sine wave. Now take 2 short sections of the wave from 2 random times, overlap them, and add them together. What happens? Well, it depends on the phase of each section. If the sections are out of phase, they cancel on the overlap; if in phase, they constructively interfere.
The phase vocoder is all about overlapping and adding sections together so that the phases of all the different sine waves in the sections line up. Thus, in any phase vocoder algorithm, you will see code that searches for peaks in the spectrum (see _time_stretch code). Each peak is an assumed sine wave, and corresponding peaks in adjacent frames should have their phases match.
I have a simple rule for GUI design: build trees not graphs. Write components that accept a state snapshot and broadcast changes. If component A listens for state changes from B, then A is a parent node of B. If A sends state to B, then A is a parent of B. Components reconcile state before broadcasting changes toward the root of the tree.
Often there is a price paid in brevity, but I believe it is worth it. It may seem annoying to propagate a click explicitly through 5 parent components just to sum clicks into a count widget, but as soon as a short circuit is made, you've created a graph, and you lose the ability to isolate GUI sub-trees for testing/debugging.
This makes visual redesigns take foreeeeever though. Imagine moving a component from the main area of your app into a menu dropdown in the navbar, now you have to tear out all those props that you painstakingly passed down through 10 intermediary layers.
The resulting format has simple compression parameters and will be optimized for time-stretching/pitch-shifting. The format is really nothing special; it is based on a sinusoids + noise model. The novelty is in the analysis algorithm, which I think identifies sinusoids particularly well, avoiding common difficulties like the Gibbs phenomenon [0], which leads to "smearing" of transients when time-stretching.
On a related note, I highly recommend Sean Carroll's series of videos: The Biggest Ideas in the Universe [0]. The fine structure constant comes up in the 10th video on Interactions [1].
Fascinating! AlphaFold (and other competitors) seem to use MSA (Multiple Sequence Aligment) and this (brilliant) idea of co-evolving residues to build an initial graph of sections of protein chain that are likely proximal. This seems like a useful trick for predicting existing biological structures (i.e. ones that evolved) from genomic data. I wonder (as very much a non-biologist), do MSA-based approaches also help understand "first-principles" folding physics any better? and to what degree? If I write a random genetic sequence (think drug discovery) that has many aligned sequences, without the strong assumption of co-evolution at my disposal, there does not seem any good reason for the aligned sequences to also be proximal. Please pardon my admittedly deep knowledge gaps.
> do MSA-based approaches also help understand "first-principles" folding physics any better?
Not really. MSA-based approaches, as most structure prediction methods, have as a goal to find the lowest energy conformation of the protein chain, disregarding folding kinetics and basically all dynamic aspects of protein structure.
> If I write a random genetic sequence (think drug discovery) that has many aligned sequences, without the strong assumption of co-evolution at my disposal, there does not seem any good reason for the aligned sequences to also be proximal.
I don't think I fully understood this, but I'll give it a shot anyway. If your artificial sequence aligns with others, there's a chance that it will fold like them, depending on the quality and accuracy of the multiple sequence alignment. Since multiple sequence alignments are built under the assumption of homology (all sequences have a common ancestor), it's a matter of how far from the "sequence sampling space" your sequence is located compared to the others.
> I don't think I fully understood this, but I'll give it a shot anyway. If your artificial sequence aligns with others, there's a chance that it will fold like them, depending on the quality and accuracy of the multiple sequence alignment. Since multiple sequence alignments are built under the assumption of homology (all sequences have a common ancestor), it's a matter of how far from the "sequence sampling space" your sequence is located compared to the others.
I understand that similar sequences may fold similarly (although as length increases, I highly doubt it, but IDK). I'm talking about aligned sub-sequences within one chain and their ultimate distance from each other in the final structure. Co-evolution suggests that aligned sub-sequences are also proximal. But manufactured chains did not evolve, therefore the assumption is no longer useful.
Oh, I see! Yes, an intrachain alignment of an artificial sequence does not by itself give any information about co-evolution, especially since you don't know whether your protein is actually folding. To assess co-evolution you need a multiple sequence alignment between protein homologs containing correlated mutations.
> I understand that similar sequences may fold similarly (although as length increases, I highly doubt it, but IDK).
As long as the sequence similarity is kept between those sequences, length is not an issue.
> Co-evolution suggests that aligned sub-sequences are also proximal
What do you mean by "proximal"? Close in space, or similar in structure?
> To assess co-evolution you need a multiple sequence alignment between protein homologs containing correlated mutations.
That makes sense. So in the CASP competition, when teams are given a sequence, do their algorithms do something like the following?
1. Search database for homologs of given sequence
2. Look at MSA and correlated mutations of homologs
3. Look for similar correlated mutations in given sequence
I imagine 1-3 could somehow be embedded in a NN after training on a protein database.
> What do you mean by "proximal"? Close in space, or similar in structure?
This is a really insightful question and I need to take some time to fully understand the ensuing discussion.
If my speculation is correct, then drug discovery should use a process of genetic programming, using something like this to score the resulting amino acid sequences. I'm wondering if an artificial process of evolution would be sufficient to satisfy the co-evolution assumption here.
In case anyone is interested in yet another alternative, I have this old, unpolished project: https://github.com/bauerca/jv
It is a JSON parser in C without heap allocations. The query language is piddly, but the tool can be useful for grabbing a single value from a very large JSON file. I don't have time for it, but someone could fork and make it a real deal.
I have resorted to partnering with a law firm, who, for a large cut of any revenue, will do all the IP work and "marketing" (i.e. contacting legal departments at companies that might be interested in the algorithm). This is not ideal, but is so far the only path presented that may help me recoup wages lost by not working full-time for several years. I figure if I retain control of the IP (and make money through licensing), I can make sure scientists and researchers have free usage rights.
If the IP thing works, I can hopefully continue independent research. If it works well, I hope to self-fund more research without the IP shenanigans. Otherwise, it is back to full-time employment.