Archive for the ‘Big Data’ Category

The Next New New Thing

In 1999 Michael Lewis told the story of “the new new thing” in terms of a single individual, Jim Clark, a “new-capitalist adventurer” in the words of the NY Times reviewer.  It was an exciting story but as we approach 2015, it seems dated, even quaint – dated because the new new things were individual companies – Silicon Graphics, Netscape, myCFO and Healtheon.

Today new new things are explosions of companies that seem to come in waves – waves such as cloud computing, Big Data and now what Shivon Zilis, of Bloomberg Beta, calls machine intelligence.  One wave often drives another, or at least enables it.  Machine Intelligence, perhaps the newest new thing, depends on massive data sets, so Big Data had to come first.

Shivon has done us all a service by scouring the startup world for artificial intelligence, machine learning and data-related technologies and created a landscape that puts them all in context.  Her diagram of the Machine Intelligence Landscape – she’s using “machine intelligence” as a unifying term for machine learning and AI – has five categories, each with multiple subcategories that suggest some of the areas where they will transform the way we work and multiple companies already implementing them (

  • Core Technologies

o   Artificial Intelligence – Deep Learning – Machine Learning – NLP Platforms – Predictive APIs – Image Recognition – Speech Recognition

  • Rethinking Enterprise

o   Sales – Security/Authentication – Fraud Detection – HR/Recruiting – Marketing _ Personal Assistant – Intelligence Tools

  • Rethinking Industries

o   AdTech – Agriculture – Education – Finance – Legal – Manufacturing – Medical – Oil and Gas – Media/Content – Consumer Finance – Philanthropies – Automotive – Diagnostics – Retail

  • Rethinking Humans/HCI (human-computer interaction)

o   Augmented Reality, Gestural Computing, Robotics, Emotional Recognition

  • Supporting Technologies

o   Hardware – Data Prep – Data Collection

Shivon recommends we focus on her core technology category for innovations at the heart of machine intelligence and suggests using the landscape to package some of the technologies into a new new industry application for those of us looking to build a company.  So spot the market opportunities, and you have an amazing map for innovation!  Even Harry Potter didn’t have one of these!

Making Sense of Change

We all live in perpetual information overload and a swirl of new technologies.  Continuous learning is no longer an option.  Learn or be  lost.  Keeping track of it all, fitting pieces together, is a challenge that seems to become increasingly impenetrable.  Now Brian Solis of Altimeter has given us a structure to help us sort through the emerging digital universe.  Thank you, Brian!

Cloud-based social, mobile and real-time technologies are the hub of the Brian Solis Wheel of Disruption.

In the first circle around the three core themes are the following seven emergent technologies and sectors:

  • Big Data
  • Apps
  • Ephemeral (content that disappears in a short time)
  • Geo-location
  • Messaging
  • Gamification
  • 2d Screen

The second circle contains seven more:

  • Wearables
  • Makers
  • Beacons
  • Internet of Things
  • Sharing
  • Virtual AI – AR (Artificial Intelligence & Augmented Reality)
  • Payments

Alongside the wheel are six themes implemented by these technologies::

  • Platforms
  • Alternative Currencies
  • Mass Personalization
  • Crowd Funding/Lending (and I would add, Sourcing)
  • Anonymous/Private web
  • Instant Gratification

Here’s Brian’s marvelous infographic:

My head already feels clearer!  I hope yours will as well!

Banking without Banks

Banks said it couldn’t be done.  But innovative entrepreneurs are capitalizing on social media, Big Data and machine learning technology to make capital available to people who couldn’t get it before or couldn’t get it at affordable rates.

One company enables middle-class consumers in emerging markets to gain access to short-term loans by using social media to prove their credit worthiness.  Another has, in just 7 years, made $1 billion in business loans to small businesses with poor credit.  A third manages a credit marketplace of borrowers and investors in order to facilitate personal and business loans at lower rates than borrowers can get from banks – $7 billion in loans in 7 years. The fourth provides micro-loans without collateral to low-income Hispanic families who lack a credit history.

The companies are Lenddo, OnDeck, Lending Club and Progreso Financiero, and their founders told their stories at the Data Driven NYC Meetup in May.

The Situation

According to James Gutierrez, Founder & CEO, Insikt, Inc., a financial data analytics company, and formerly CEO of Progreso Financiero, which he founded:

  • New regulations for banks have changed lending:

o   Credit Card Act

o   Dodd Frank

o   Basel II

o   Basel II/II.5.

  • The availability of revolving credit is down – affects small business lending by banks – non-prime consumers are hit hardest
  • Technology is changing lending across the value chain, driving the price down

o   Applications drive higher volume and lower costs

o   Underwriting – big data makes more sources accessible and results in lower risk and increased capital

o   Servicing – electronic payments – ACH means lower costs and lower risks

o   Mobile payments by lower income borrowers – lenders use SMS for collection to lower cost and lower risk.

  • Banks can’t keep up with nonbank alternative lenders, which are transforming all loan products.

Four Nonbank Alternative Lenders Speak

Jeff Stewart, Founder & CEO, Lenddo

  • Launched early 2011
  • Serves middle class consumers in emerging markets
  • Goal was to involve the crowd in lending to the middle class using micro finance techniques and social media data to establish creditworthiness where none exists
  • Social data add value by making it possible to map good vs. bad borrowers in terms of affiliation because “birds of a feather  gather together” – “even two degrees out tells us how you will perform”
  • Integrates social networks with mobile & the cloud to use data sources
  • Works with the community on both demand generation and collections/repayments
  • Storage was a major technology issue – Chose MongDB at the outset with Amazon Web Services – database grew explosively – all opt in
  • User data = social data – grew exponentially – expensive even when only 20K members.
  • Realized “It’s big data, not big database” so they moved data to simple storage, created cache MongoDB for queries and cut costs 70% – they think about data use cases.
  • No database frees you to solve problems
  • Looking closely at bitcoin block chains to add value for transactions.

Noah Breslow, CEO, OnDeck

  • Has made $1 billion in business loans to small businesses with poor credit over 7 years
  • Loan size is small – $100K-$300K; banks need larger loans or they lose money – they need $1 million and up
  • SMEs represent a large and underserved addressable market
  • Built a platform to connect Main St. to Wall St., with OnDeck playing all four essential roles:  originator, servicer, credit bureau for collecting and aggregating data, and credit scoring (FIC0)
  • The database tracks small businesses from birth to death.
  • The digital footprint of different stores is totally different and depends on different data sources
  • Co. adds private performance data to public data and does a lot of fraud management and triangulation
  • Developed a different kind of credit score – it’s a business credit score, not a personal one with the focus on debt service calculations:  cash flow, trade credit, business attributes.  Social data is noisy – need to look at patterns.
  • A large number of small transactions – restaurants, retailers, doctors & dentists, small manufacturers, etc.
  • Gather data for scoring a business from many sources.  Building a data aggregation and learning platform that includes Mechanical Turk and common sense.
  • They price to risk.

Renaud Laplanche, Founder & CEO, Lending Club

  • Has created a credit marketplace of borrowers and investors, where Lending Club facilitates loans but is not involved on the credit side.
  • As a result, LC can operate at a lower cost than banks – banks have 5%-7% of amounts lent in operating expense vs. LC , which is under 2% and declining.
  • LC incurs none of some bank costs, such as the cost of branch offices, reserve requirements, and has lower costs or more advanced technology for customer acquisition, underwriting, origination and servicing.
  • Bank’s intermediation cost for credit cards is 16.99%.  LC’s range is 7.9%-127% with average intermediation cost of 4.83%.
  • A lower lending rate means a higher return to investors.
  • Have consistently controlled LC’s growth.  Now have 550 employees; hire 100 people every 6 mos.
  • Use data for marketing, credit, fraud and collections – receive about 9,000 loan applications a day.  Less than 10 are fraudulent, so that fraud becomes a needle in a haystack.
  • Fraud predictors:  time of day, frequency, etc.  New data sources:  device, online footprints, application use.  Look at consistency of the information provided, behavior online footprint, machine/device and location signal.
  • LC uses machine learning to assess risk and predict fraud based on more than a thousand attributes
  • Fraud attempts have declined from 5% to 2%.
  • Just formed a partnership with Union Bank in San Francisco, which overcomes the challenge of complying with 49 sets of state regulations for Lending Club and opens the way for a traditional bank to offer products it could not otherwise offer.  (Probably an indicator of a future trend.)

James Gutierrez, Founder & former CEO, Progreso Financiero

  • James was a 2005 MBA from Stanford
  • Micro finance gave him the idea for Progreso Financiero – unsecured micro loans and debit cards – delivered from a table in the supermarket – to help immigrants with no FICO scores
  • Typical loan size was $1,000 for 12 months.  Had to make 10,000 loans to get $1,000 back.
  • Immigrants with no FICO scores are a challenging population to underwrite
  • What he did:

o   Took an eHarmony approach with a robust application with extremely detailed personal data.  Detailed data turned out to be valuable.

 o   Booked some bad loans – a learning experience – the most valuable data is performance data – having a huge amount of data helps build a model – data science is no help

 o   Aggregated and analyzed alternative data:

      • 300 attributes on the application
      • Separate borrowers into nodes
      • Later 2,500 attributes from multiple sources
      • Over 120 segments
    •  Data helped simplify the process & determine the score
    •  Merged alternate data with bureau data
  • Fair equal Opportunity Act – the jury is out on what data you can use to deny credit, the actual underwriting decision
  •  Made more than 500,000 loans with single digit loses
  • Partnered with Prosper, the first peer-to-peer lending marketplace, with more than 2 million members and over $1 billion in funded loans.
    •  Risk model design
    • Loan valuation framework
    • Valuation stress testing.

Today James’s focus is on Insikt Inc., a financial data analytics company that uses data for risk models to apply to consumer markets.  He and his team are working on how to originate loans in the subprime market and how to create a more curated market for securitization – concerned with both bond performance & loan performance.

Panel Discussion, Moderated by Matt Turck, Organizer

  • Banks are encumbered by regulation
  • Alternative lending is only 5% of consumer finance

o   In 5 years we’ll see a lot of new entrants

o   More partnering with banks

  • Transforming the bank system to be more transparent and customer friendly
  • There are four different segments with lending opportunities

o   SMB

o   Consumers

o   US

o   Emerging markets

  • Top ten global institutions will be big players in financial services
  • Rates for consumers can be as low as 6.5%, average 12.5%; for business, 5.9%
  • Rate risk is better through the use of data and marketplace dynamics drive interest rates down
  • A virtuous circle continues to make credit more affordable and drives interest rates down
  • Most credit cards are priced at prime plus
  • Alternative lenders return a higher return to investor
  • Rates rise in a better performing economy and defaults come down so we expect our proceeds to be stable.

Deriving Big Value from Big Data

What is Big Data and what does it do to how we do business?  Ask the people doing it.  That’s what Matt Turck, of FirstMark Capital, did at the 28th Data Driven Meetup, which he organizes.  He gave four Big Data stars the mike – two entrepreneurs, a data scientist and a VC who used to be an entrepreneur.  Robbie Allen of Automated Insights, and Joe Hellerstein, of Trifacta, were the entrepreneurs; Rachel Schutt, of Newscorp, was the data scientist, and Chris Lynch, of Atlas Venture, was the VC.

So what are these companies doing?

  • Delivering automated narrative reports of quantifiable data in real time that are designed for individual user groups.
  • Enabling users to easily transform raw, complex data into clean and structured formats for analysis so that analysts can have direct access to Big Data and both analysts and data scientists can be significantly more productive in delivering business decisions.
  • Building a corporate data culture led by a cross-functional team headed by the CTO that combines data science, IT and product management in order to help journalists tell stories and develop a sustainable content/publishing/media company business model.
  • Using lessons learned from running a Big Data pioneer to help new Big Data companies understand the importance of simple messaging that dummies can understand, ease of use, security and designing the business to connect directly to user value in order to optimize monetization, even using someone else’s Big Data platform to achieve this.

Robbie Allen, CEO & Founder, Automated Insights (

His theme:  Let Your Data Tell Its Story.  His company has developed a patented platform, called Wordsmith, that writes insightful, personalized reports from client data – reports comparable to an expert talking in plain English to each user.  Its cloud software turns raw data into compelling content customized for specific users and groups of users.  It delivers automated insights as narrative content at scale, in real time, on any device.

Visualizations require mental gymnastics to translate.  They also suffer from the baseline effect – small changes are imperceptible, which renders most dashboards useless.  But pictures don’t tell stories.  Words tell stories.  Data density can obscure meaning.  A single word can sometimes do the job best.

Media companies are the target customers for the Wordsmith platform, which creates reports from quantifiable data.

  • The data can come from anywhere – external databases, real-time data, proprietary data, and historical data.
  • The platform creates algorithms that create facts and lists of facts.
  • It then describes those facts as narrative.
  • Creates tweets and other messages


  • Yahoo fantasy football grades users on their drafts – using any tone desired, including snarky.
  • InvestCorp – portfolio recap.  (Ultimately data scientists will be replaced b automated processes.)
  • Samsung – fitness update – “quantified self.”
  • Honda – sales reporting

Wordsmith Marketing can generate fully automated personalized websites and performance reports that replace Google Analytics.

Rachel Schutt, Chief Data Scientist, Newscorp

Newscorp owns multiple major news media.  The parent company has begun leveraging all its companies instead of acting as a holding company in order to create new business models based on their content.

Data is at the heart of its future.

Schutt, who has a PhD in statistics, reports to the CTO.  Her two peers are people who head the platform and the product. The goal is to build a data culture, investing in people rather than tools.  They build cross-functional teams, and everyone codes.

Examples of data science in action:  churn models.  Propensity analysis.  User behavior modeling.

The plan is to make data-based decisions to help journalists tell a story and to make sound business decisions that can build a longer term sustainable media business model.

Chris Lynch, Partner, Atlas Venture

Vertica was the first new database in 3 years.  Chris, previously a sales and marketing business entrepreneur, was CEO.  The engineers couldn’t communicate except to data scientists.  He had to simplify the message so it scaled.  Vertica was a real-time analytics database that was faster than others and delivered actionable insights in time to make decisions It was sold to HP for several hundred million dollars.  But Zynga was sold for billions.  Zynga was an analytics company that masqueraded as a gaming company.  It used Vertica’s real-time database to analyze user behavior and target sales of virtual products.  Vertica was disintermediated.  You need to be close to the customer for monetization – to connect directly to customer value.

Chris has been a VC for two years.  His thesis is that big data can be transformational if democratized so that it talks to dummies, not the 1 percent.  Scale, security and simplicity are issues.  Individuals will own their own digital footprints.  Simplicity means ease of use so the magic is behind the curtain and users have an intuitive interface.

Think about monetizing someone else’s platform.  Disintermediate those guys.  Leverage the apps an platforms you can put under a problem.  What’s the problem you’re solving? Moving up the stack creates more value for the customer.

What he looks for in a company is people with courage, character and conviction – people with a soul.   You need people to build stuff.

The venture model is broken.  Too much money in the market (Lazy VCs).  Too few good ideas.  Chris takes pride in being a company builder.

Joe Hellerstein, Founder & CEO, Trifacta

Trifacta has developed a platform designed to “transform the way the world works with data.”  It is designed to make it easy for analysts to have direct access to Big Data and to increase productivity for both analysts and data scientists.  They asked analysts what they did and how long it took and found that 80 percent of the work on data is cleaning data.  So, Trifacta takes raw logs and transforms them for immediate analysis.  Joe demonstrated this, transforming a typical log of restaurant violations in minutes into a straightforward list of where not to eat in San Francisco!