Contributors   |   Messages   |   Polls   |   Resources   |  
Comments
Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
JohnBarnes
JohnBarnes
5/3/2017 11:14:55 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
Mhhf1ve, 1. Helps instructors spot students who are not properly cleaning/munging data. 2. Creates a set of known "lazy" results that exposes some sloppy and/or biased journalists But most of all, 3. Data points are numbered in a complicated part-sequential part-meaningful code; checking for that missing point (and a few random non-misses) pretty quickly tells NWS that their data set has been appropriated contrary to the public access conditions on its release. Not unlike the "paper towns" on maps.

50%
50%
mhhf1ve
mhhf1ve
5/3/2017 10:27:37 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
Why does the national weather service do that? If it's that easy to spot, then what's the point of injecting it?

50%
50%
JohnBarnes
JohnBarnes
5/3/2017 9:55:41 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
mhhf1ve,

Yep. The National Weather Service kindly makes a huge storm-loss dataset available for data scientists in training (and because it's real data there's all kinds of garbage in there). They added one data point to it -- a flood in California for which they added several zeros to the dollar property damage.  If you don't trap that point out, it makes all your linear regressions do very weird things (since that one point totally overwhelms all the other data points).

Spoof data would more likely have to be done by moving all the data points a little bit in non-random ways -- or rather first  adding the desired component, and then re-randomizing them.  That's a lot of work to go to to make it look like unemployment was higher or lower, or hamburger sales went up or down, or a given commercial dye was or wasn't causing a disease.A lot cheaper and easier to count on voters and legislators being too innumerate to understand the data anyway -- that's what's worked for tobacco cancer denial, anti-vaccine campaigns, and climate science denial anyway.

Also, spoof data is often subject to someone re-collecting or re-analyzing.  I just don't think people will go to the effort very often, and if it's worth the effort, that's exactly the data that's going to get scrutinized most harshly.

50%
50%
mhhf1ve
mhhf1ve
5/3/2017 4:24:58 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
I think "spoof data" may be difficult to verify. I know from experience that many sources of industry data -- like PWC's industry revenues data -- are proprietary when they are initially published, and then after a few years, the data leaks out. But I think these sources do some kind of "copyright traps" and publish some fake data along with the real data to help identify where the leaks or copies came from. (Map makers have done this sort of thing for a loooong time.)

https://en.wikipedia.org/wiki/Agloe,_New_York

 

50%
50%
JohnBarnes
JohnBarnes
5/2/2017 7:11:05 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
mhhf1ve,

I've seen quite a bit about the spoofing of search engines but very little spoof data, so far. (As a longtime writer and newbie data scientist this is something I'm looking into more than casually).  Spoof news (what we have to call it since the Republican liars started yelling "fake news" at all the news they don't like, deliberately corrupting the term to obscure the clear evidence that it was nearly all coming from their side) very often bears the same birthmarks as spoof phishing emails, i.e. extensive copying of the form but minor slips in phrasing and details. (For example, easily-identified repurposed photographs, quotes from heads of non-existent government agencies, etc.) and is good enough to slip through many bots (though the bots are getting better).  Spoof data would require a lot more concocting; you could detect it through too-regular variances very easily, for example.  Also, spoof data, unlike spoof news, doesn't invite naive people to forward it to all their friends. ("Hey!  Here's a million 300-field climate observations that prove global warming is a hoax! Just run it through Tableau and the results are amazing!"), and almost anyone doing data journalism is going to be checking provenance before they do anything else.

Not to say there won't eventually be spoof data, probably lots of it, but the time for that is not quite yet.

50%
50%
mhhf1ve
mhhf1ve
5/2/2017 2:27:46 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
> "Data Journalism is growing in popularity all the time."

Let's hope that the data can be maintained to be (mostly) true and verifiable by bots? 

50%
50%
Michelle
Michelle
5/2/2017 2:18:48 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
@mhh I knew about the site, but not who founded it. Very interesting! Data Journalism is growing in popularity all the time.

50%
50%
mhhf1ve
mhhf1ve
5/1/2017 4:20:30 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
It's interesting how data is becoming a journalistic.. and even a philanthropic pursuit. Steve Ballmer funded USA Facts!

https://www.nytimes.com/2017/04/17/business/dealbook/steve-ballmer-serves-up-a-fascinating-data-trove.html?_r=0

50%
50%
mhhf1ve
mhhf1ve
5/1/2017 4:18:47 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
> "I've already seen the phrase "Tableau journalist" to describe freelancers who mine publicly available data for reportable stories."

Hmm. Those stories are only as good as the data.. I think maybe that's part of how "fake news" gets into the mainstream -- bad data and lies and statistics (and lies).

 

50%
50%
freehe
freehe
4/30/2017 6:27:20 PM
User Rank
Platinum
Re: Speech or voice recognition has been around for decades...
@Michelle, I agree but it will take years for them to sift through all the data, money and time.

50%
50%
Page 1 / 2   >   >>


Latest Articles
Italy's 5G auction could exceed a government target of raising €2.5 billion ($2.9 billion) after attracting interest from companies outside the mobile market.
The emerging-markets operator is focusing on the humdrum business of connectivity and keeping quiet about some of its ill-fated 'digitalization' efforts.
Three UK has picked Huawei over existing radio access network suppliers Nokia and Samsung to build its 5G network.
Vendor says that it's its biggest 5G deal to date.
Verizon skates where the puck is going by waiting for standards-based 5G devices to launch its mobile service in 2019.
On-the-Air Thursdays Digital Audio
Orange has been one of the leading proponents of SDN and NFV. In this Telco Transformation radio show, Orange's John Isch provides some perspective on his company's NFV/SDN journey.
Special Huawei Video
10/16/2017
Huawei Network Transformation Seminar
The adoption of virtualization technology and cloud architectures by telecom network operators is now well underway but there is still a long way to go before the transition to an era of Network Functions Cloudification (NFC) is complete.
Video
The Small Cell Forum's CEO Sue Monahan says that small cells will be crucial for indoor 5G coverage, but challenges around business models, siting ...
People, strategy, a strong technology roadmap and new business processes are the key underpinnings of Telstra's digital transformation, COO Robyn ...
Eric Bozich, vice president of products and marketing at CenturyLink, talks about the challenges and opportunities of integrating Level 3 into ...
Epsilon's Mark Daley, director of digital strategy and business development, talks about digital transformation from a wholesale service provider ...
Bill Walker, CenturyLink's director of network architecture, shares his insights on why training isn't enough for IT employees and traditional ...
All Videos
Telco Transformation
About Us     Contact Us     Help     Register     Twitter     Facebook     RSS
Copyright © 2024 Light Reading, part of Informa Tech,
a division of Informa PLC. All rights reserved. Privacy Policy | Cookie Policy | Terms of Use
in partnership with