jrm5100's comments

jrm5100 · on Feb 10, 2021

Issues with source code access aside, your description is mostly wrong. These programs take a DNA profile as input- it's just that the DNA profile is mixed (i.e. from multiple people). It reporting no DNA would be nonsensical. Figuring out exactly how many people are in a mixture isn't quite nailed down statistically (last I knew of), but it's usually pretty clear for up to 4 or so people.

Yes, you could run different models and get different probabilities. For example, the likelihood that the sample is a mixture of the suspect, the victim, and some unknown person vs victim and two unknown people compared to saying the victim isn't in the sample. However, the specification of those models is part of the trial process.

And the output probabilities (at least when being used to determine guilt) are usually quite high, orders of magnitude higher than 90% or even 99.99%.

My point is that the science behind these calculations is well developed- validation studies get published all the time. Whether or not the specific software has errors (or isn't coded exactly as modeled) is an entirely different matter, but it still isn't all that likely. All of these cases rely on expert witnesses anyway- it's not the prosecutor pressing some buttons and printing a report.

There is far more concerning quackery that gets used in forensics- bite marks, hair matching, etc.

jrm5100 · on July 14, 2019

As far as I'm aware pathogens in meat aren't the main issue- it's the compounds formed when the meat is cooked.

https://www.cancer.gov/about-cancer/causes-prevention/risk/d...

Fish cooked at high temperatures is also not good.

dogma1138 · on July 14, 2019

I’m talking about studies such as this: https://www.ncbi.nlm.nih.gov/m/pubmed/22212999/

Which link various pathogens native to the species consumed to cancer in humans as apparently the compounds that arise from cooking or processing meat which also appear in poultry and fish do not increase cancer risk by as much (or at all) as mammalian meat especially beef and pork despite having the same or even higher concentration of these compounds.

jrm5100 · on Oct 2, 2018

> If so, why didn’t the employees change jobs?

Monosopy.

eeZah7Ux · on Oct 2, 2018

https://en.wikipedia.org/wiki/Monopsony

...why the downvotes?

Dylan16807 · on Oct 2, 2018

Because it doesn't make any sense. Amazon doesn't dominate the jobs market.

mikepurvis · on Oct 2, 2018

Depends how you look at it. The fulfillment centers are intentionally built out in the middle of nowhere where land and labour are both plentiful and inexpensive.

If you live in such a place and Amazon is your employer, it may well be the case that you don't have a lot of other options, especially if what you came from was being on social assistance.

jcims · on Oct 2, 2018

So now they will have an even stronger foothold in the depressed areas that they provided people with jobs. EMTs, teachers, etc are going to start taking jobs the Amazon because it pays better.

prolikewh0a · on Oct 2, 2018

>I'm guessing for many people an Amazon warehouse job is stable employment near their small town that might not have many opportunities otherwise.

If they do their research, they'll realize they probably won't have a job for that long [1] and it would be preferable to stay in a position (that likely pays better with your examples) that is reliable.

[1] https://www.ibtimes.com/amazoncom-has-second-highest-employe...

smsm42 · on Oct 2, 2018

Land may be plentiful in the middle of nowhere but labor wouldn't be. Middle of nowhere is usually sparsely populated, that's why it's middle of nowhere.

tomjakubowski · on Oct 2, 2018

The flip side of that is workers in sparsely populated areas are willing (and more able to) commute farther for work.

smsm42 · on Oct 2, 2018

True, but given pretty hard cap on commute speed, I don't think it changes much. It's not unheard of to commute 50 miles to work now in the Bay Area (it's basically San Jose to San Francisco), how much more you can do in the middle of nowhere - double? I don't think many people would drive 3 hours there and 3 hours back every day.

walshemj · on Oct 2, 2018

No they are built near transit hubs, out in the sticks wont have a motorway running nearby as does the one nearest me

mesh · on Oct 2, 2018

It may dominate (some) local job markets.

quickben · on Oct 2, 2018

Depends where the warehouse is.

jrm5100 · on Jan 30, 2018

The forensic field moves very slowly in adopting new technology since they have to do a lot of re-validation every time anything changes.

This idea already here, but with technology based on older/simpler methods. One example is Intigenx: https://integenx.com/rapidhit-system/

The FBI has plans for implementing them: https://www.fbi.gov/news/testimony/fbis-plans-for-the-use-of.... The laws just need to catch up (or prevent it).

jrm5100 · on June 21, 2017

It may be millions of units by now- it was 1M about 2 years ago (https://www.genomeweb.com/microarrays-multiplexing/1m-custom...).

Minor nitpicking: the 23andMe kits use microarrays for genotyping, not DNA sequencing. It's an older and more limited technology but it is much cheaper than sequencing.

jrm5100 · on March 22, 2017

To add to your points, a very important consideration is that charter schools can game the metrics by forcing out students who aren't performing well, while public schools can't turn anyone away. This may be why they seem to improve test scores without making any actual difference [1].

Public schools can use charters to game metrics themselves too [2].

[1] http://www.nber.org/papers/w22502

[2] https://www.propublica.org/article/alternative-education-usi...

jrm5100 · on Sept 27, 2016

plus a special sauce for counting the number of specific bp repeats, due to in-del events, this is not something I am not too familiar, but presumably the number of a specific k-mer repeats you have in these genes of interest might correlate to a specific type of cancer? (would love to hear someone who is an expert in this field their opinion).

"Copy number variant" refers to larger deletions and duplications that can occur in the genome. There isn't some specific cutoff for size, but some examples in these kinds of genes would be an entire exon or gene. There are countless studies that find correlations between specific variants or CNVs and risk of cancers.

Standard variant detection is pretty straightforward. CNVs are harder because they are longer (several hundred to several thousand base pairs) than the raw data (150 to 250 bp for Illumina)- you don't get single reads that span the entire variant. You have to normalize then look for differences in coverage, or look for split reads (where the read is aligned on the border of one of these CNVs).

This kind of funding baffles me because they don't seem to be proposing anything new at all (maybe slightly better CNV detection?) and there are already lots of labs/companies doing this kind of testing. Maybe they are working on being very efficient to offer a better price.

noname123 · on Sept 27, 2016

Thanks for your detailed explanation.

Just out of curiosity and to follow-up, presumably this is a example of the list of detected CNVs in a TCGA Breast Cancer data-set you're referring to: http://cancer.sanger.ac.uk/cosmic/gene/analysis?ln=BRCA1#cnv...

According to Sanger (or maybe TCGA?), a gain is when a genomic region (for a diploid) has more than five absolute copies of this region and a loss is when the genomic region has no reads ((http://cancer.sanger.ac.uk/cosmic/help/cnv/overview), where the copy number is perhaps determined by that normalized distribution of read coverage across the reference genome?

(http://bmcbioinformatics.biomedcentral.com/articles/10.1186/...). This is for CNVs that are longer than the 150-200bp Illumina fragments (Fig1c. Read Depth method, e.g., exome#3 looks like it has two absolute copies vs exome #1 and #2)

Then for small CNVs that perhaps span that 150-200bp fragment, we use the split read method to filter for incompletely mapped reads that are only aligned on the edges to the reference. This implies that there was a duplication event that expanded that sequence? (Fig 1b. Split Read method).

Presumably, the pipeline would determine the CNV sites in a specific patient sample, then cross-reference with the TCGA CNV data-set and come up with correlation score of how much those CNVs sites match with consensus CNVs in the cancer data-set? Thanks again for your detailed breakdown.

jrm5100 · on Sept 27, 2016

The Sanger/TCGA (The Cancer Genome Atlas) stuff seems to be specific to microarray data which is different (older, more expensive) than the newer high-throughput data.

The figure you linked is a good explanation. The split read method is helpful for finding the edges of the CNV, while the number of reads (relative to other regions that were tested) can give an idea of the number of copies. The problem is that these methods all have their own unique biases/noise that makes it non-trivial to figure out the absolute copy number change.

Ideally they would find a similar CNV that has some clinical association.

The DGV has a lot of reference CNVs. Here are some in BRCA1: http://dgv.tcag.ca/gb2/gbrowse/dgv2_hg19/?name=id:3087443;db...

noname123 · on Sept 27, 2016

Thanks jrm5100 for the link. I see the variants under the "DGV Structural Variants" track. Really appreciate your explaining what CNVs are and also following up on my questions/confusions!