RSS

API Definitions News

These are the news items I've curated in my monitoring of the API space that have some relevance to the API definition conversation and I wanted to include in my research. I'm using all of these links to better understand how the space is defining not just their APIs, but their schema, and other moving parts of their API operations.

Bringing Discovery Within Data API Marketplaces Out Into The Open

I spend time reviewing each wave of data API marketplaces as they emerge on the landscape every couple of years. There are a number of reasons why these data marketplaces exist, ranging from supporting government agencies, NGOs, or for commercial purposes. One of the most common elements of API-driven data marketplaces that frustrates me is when they don’t do the hard work to expose the meta data around the databases, datasets, spreadsheets, and the raw data they are providing access to–making it very difficult to actually discover anything of interest.

You can see a couple examples of this with mLab, World Health Organization, Data.World, and others. While these platforms provide (sometimes) impressive ability to manage data stores, but they don’t always do a good job exposing the meta data of their catalogs as part of the available APIs. Dynamically generating API endpoints, documentation, and other resources based upon the data that is being published to their platforms. Leaving developers to do the digging, and making the investment to understand what is available on a platform.

Some of the platforms I encounter obfuscate their data metadata on purpose, requiring developers to qualified before they get access to valuable resources. Most I think, just do not put themselves in the position of an API consumer who lands on their developer page, and doesn’t know anything about an API. They understand the database, and the API, so it all makes sense to them, and they don’t have any empathy for anyone else who isn’t in the know. Which is a common trait of database centered people who speak in acronyms, and schema that they assume other people know, and do not spend much time thinking outside of that bubble.

I could make a career out of deploying APIs on top of other data marketplace APIs. Autogenerating a more accessible, indexable, intuitive layer on top of what they’ve already deployed. I regularly find a wealth of data that is accessible through an API interface, but will most likely never be found by anyone. Before most developers will ever make the investment to onboard with an API, they need to understand what valuable resources are available. I can imagine many developers stumble across these data marketplaces, spend about 15 minutes looking around, maybe sign up for a key, but then give up because of the overhead involved with actually understanding what data is actually available.


Making Sure My API Dependencies Include Data Provenance

I am publishing a new API for locations. I am tired of needing some of the same location based resources across projects, and not having a simple, standardized API I can depend on. So I got to work finding the most accurate and complete data set I could find of cities, regions, and countries. I settled on using the complete, and easy to use countries-regions-cities project by David Graham–providing a straightforward SQL script I can use as the seed for my locations API database.

After crafting an API for this database using AWS API Gateway and Lambda, and working my way down my API checklist, it occurred to me that I wanted to include David Graham’s work as one of the project dependencies. Giving him attribution, while honestly acknowledging my project’s dependency on the data he provided. I’m working hard to include all dependencies within each of the microservices that I’m publishing, being mindful of every data, code, and human dependency that exists behind each service I deliver. Even if I don’t rely on regular updates from them, I still want to acknowledge their contribution, and consider attribution as one layer of my API dependency discussion.

Having a dependency section of my API checklist has helped me evolve how I think about defining the dependencies my services have. I initially began tracking all other services that my microservices were dependent on, but then I quickly began adding details about the other software, data, and people the service depends on as well. I’m also pulling together a machine readable definition for tracking on my microservice dependencies. It will be something I include in the API discovery (APIs.json) document for each service, alongside the OpenAPI, and other specifications. Allowing me to track on the dependencies (and attribution) for all of my APIs, and API related artifacts that I am producing on a regular basis. Providing data provenance for each of my services, documenting the origins of all the data I’m using across my services, and making accessible via an API.

For me, having the data provenance behind each service provides me with a nice clean inventory of all my suppliers. Understanding the data, services, open source code, and people I depend on to deliver a service is important to helping me make sense of my operations. For the people behind the data, services, and open source code I depend on it helps provide attribution, and showcase their valuable contribution to the services I offer. For partner and 3rd party consumers of my services, being observable about the dependencies that exist behind a service they are depending on, helps them make much more educated decisions around which services they put to work, and bake into their applications and systems. In the end, everyone is better off if I invest in data provenance as part of my wider API dependency efforts.


Monetizing Your Device Location Data With LotaData

There are a lot of people making money off of the acquisition, organization, and providing access to data in our digital world. While I quietly tune into what the data monetization trends are, I am also actively looking for interesting approaches to generating revenue from data, but specifically with an eye on revenue sharing opportunities for the owners or stewards of that data. You know as opposed to just the exploitation of people’s data, and generating of revenue without them knowing, or including them in the conversation. To help counteract this negative aspect of the data economy, I’m always looking to highlight (potentially) more positive outcomes when it comes to making money from data.

I was recently profiling the API of the people intelligence platform LotaData, and I came across their data monetization program, which provides an interesting look at how platforms can help data stewards generate revenue from data, but in a way that makes it accessible to individuals looking to monetize their own data as well. “LotaData’s AI platform transforms raw location signals into ‘People Intelligence’ for monetization, usually based upon the follow key attributes: latitude, longitude, timestamp, deviceID, and accuracy.”

Representing activity at a location and/or point in time, allowing LotaData’s to understand what is happening at specific places at scale, and develop meaningful insights and behavioral segments that other companies and government agencies will want to buy into. Some of the examples they provide are:

  • Commuting daily on CalTrain from Palo Alto to San Francisco
  • Mid-week date night at Nopa on the way back from work
  • Sweating it out at Soul Cycle on Saturday mornings
  • Taking the dog out for a walk on Sunday afternoons
  • Season ticket holder for Warriors games at the Oakland Arena

LotaData’s location-based insights and segments are entirely inferred from raw location signals, emphasizing that they do not access or collect any personally identifiable information (PII) from mobile phones–stating that they “do not and never will collect PII such as name, email, phone number, date of birth, national identifier, credit cards, or other sensitive information”. Essentially walking on the light side of the whole data acquisition and monetization game, and playing the honest card when it comes to the data economy.

When it comes to the monetization of data, LotaData enables marketers, brands, city governments and enterprise businesses to purchase location-based insights–providing an extensive network of data buyers who are ready to purchase the insights generated from this type of data. Then the revenue generated from the sale of an insight is split proportionately and shared with the app developers who contributed their app data. With the SDK agreement with LotaData governing the payment terms, conditions and schedule for sharing revenue. However, if you are unable to integrated LotaData’s SDK in a mobile app for any reason, they can offer you alternative ways to share and monetize your location data:<p></p>

  • Geo-Context API - The Geo-Context API is a simple script that you can embed in your mobile web sites and web apps. The script collects location data with explicit notice and permission obtained from end users.
  • Bulk Data Transfer - Customers that are proficient in collecting location signals from their mobile apps, websites or other services, can easily upload their historical location archives to LotaData’s cloud for analyzing, inferring and monetizing mobile user segments. The data can be transferred to LotaData by configuring the appropriate access policies for AWS S3 buckets.
  • Integration - LotaData can integrate with CRM and in-house data warehouse systems to ingest custom datasets or usage logs for deep analysis, enrichment and segmentation. Questions?

Providing a pretty compelling model for data providers to monetize their location based data. It is something I’ll be exploring more regarding how individuals can aggregate their own personal or professional data, as well as take advantage of the geo context API, bulk data transfer, or other integration opportunities. I have no idea how much money an individual or company could make from publishing data to LotaData, but the model provides an interesting approach that I think is worth exploring. It would be interesting to run a 30 to 90 test of tracking all of my location data, uploading it to LotaData, and then sharing the revenue details about what I can make with through a single provider like LotaData, as well as explore other potential providers so that you could sell your location data multiple times.

In a world where our data is the new oil, I’m interested in any way that I can help level the playing field, and seeing how we can put more control back into the device owners hands. Allowing mobile phone, wearable, drone, automobile, and other connected device owners to aggregate and monetize their own data in a personal or professional capacity. Helping us all better understand the value of our own bits, and potentially generating some extra cash from its existence. I don’t think any of us are going to get rich doing this, but if we can put a little cash back in our own pockets, and limit the exploitation of our bits by other companies and device manufacturers, it might change the game to be a little more in our favor.


Synthetic Healthcare Records For Your API Using Synthea

I have been working on several fronts to help with API efforts at the Department of Veterans Affairs (VA) this year, and one of them is helping quantify the deployment of a lab API environment for the platform. The VA doesn’t want it called it a sandbox, so they are calling it a lab, but the idea is to provide an environment where developers can work with APIs, see data just like they would in a live environment, but not actually have access to live patient data before they can prove their applications are reviewed and meet requirements.

One of the projects being used to help deliver data within this environment is called Synthea. Providing the virtualized data will be made available through VA labs API–here is the description of what they do from their website:

Synthea is an open-source, synthetic patient generator that models the medical history of synthetic patients. Our mission is to provide high-quality, synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable.

Synthea data contains a complete medical history, including medications, allergies, medical encounters, and social determinants of health, providing data can be used without concern for legal or privacy restrictions by developers to support a variety of data standards, including HL7 FHIR, C-CDA and CSV. Perfect work loading up into sandbox and lab API environments, allowing to developers to safely play around with building healthcare applications, without actually touching production patient data.

I’ve been looking for solutions like this for other industries. Synthea even has a patient data generator available on Github, which is something I’d love to see for every industry. Sandbox and labs environment should be default for any API, especially APIs operating within heavily regulated industries. I think Synthea provides a pretty compelling model for the virtualization of API data, and I will be referencing it as part of my work in hopes of incentivizing someone to fork it and use to provide something we can use as part of any API implementation.


General Data Protection Regulation (GDPR) Forcing Us To Ask Questions About Our Data

I’ve been learning more about the EU General Data Protection Regulation (GDPR) recently, and have been having conversation about compliance with companies in the EU, as well as the US. In short, GDPR requires anyone working with personal data to be up front about the data they collect, making sure what they do with that data is observable to end-users, and takes a privacy and security by design approach when it comes to working with all personal data. While the regulations seems heavy handed and unrealistic to many, it really reflects a healthy view of what personal data is, and what a sustainable digital future will look like.

The biggest challenge with becoming GDPR compliant is the data mess most companies operate in. Most companies collect huge amounts of data, believing it is essential to the value they bring to the table, without no real understanding of everything that is being collected, and any logical reasons behind why it is gathered, stored, and kept around. A “gather it all”, big data mentality has dominated the last decade of doing business online. Database groups within organizations hold a lot of power and control because of the data they possess. There is a lot of money to be made when it comes to data access, aggregation, and brokering. It won’t be easy to unwind and change the data-driven culture that has emerged and flourished in the Internet age.

I regularly work with companies who do not have coherent maps of all the data they possess. If you asked them for details on what they track about any given customer, very few will be able to give you a consistent answer. Doing web APIs has forced many organizations to think more deeply about what data they posses, and how they can make it more discoverable, accessible, and usable across systems, web, mobile, and device applications. Even with this opportunity, most large organizations are still struggling with what data they have, where it is stored, and how to access it in a consistent, and meaningful way. Database culture within most organizations is just a mess, which contributes to why so many are freaking out about GDPR.

I’m guessing many companies are worried about complying with GDPR, as well as being able to even respond to any sort of regulatory policing event that may occur. This fear is going to force data stewards to begin thinking about the data the have on hand. I’ve already had conversations with some banks who are working on PSD2 compliant APIs, who are working in tandem on GDPR compliance efforts. Both are making them think deeply about what data they collect, where it is stored, and whether or not it has any value. Something I’m hoping will force some companies to stop collecting some of the data all together, because it just won’t be worth justifying its existence in the current cyber(in)secure, and increasingly accountable regulatory environment.

Doing APIs and becoming GDPR compliant go hand in hand. To do APIs you need to map out the data landscape across your organization, something that will contribute to GDPR. To respond to GDPR events, you will need APIs that provide access to end-users data, and leverage API authentication protocols like OAuth to ensure partnerships, and 3rd party access to end-users data are accountable. I’m optimistic that GDPR will continue to push forward healthy, transparent, and observable conversations around our personal data. One that focuses on, and includes the end-users who’s data we are collecting, storing, and often time selling. I’m hopeful that the stakes become higher, regarding the penalty for breaches, and shady brokering of personal data, and that GDPR becomes the normal mode of doing business online in the EU, and beyond.


Facebook, Cambridge Analytica, And Knowing What API Consumers Are Doing With Our Data

I’m processing the recent announcement by Facebook to shut off the access of Cambridge Analytica to it’s valuable social data. The story emphasizes the importance of real time awareness and response to API consumers at the API management level, as well as the difficulty in ensuring that API consumers are doing what they should be with the data and content being made available via APIs. Access to platforms using APIs is more art than science, but there are some proven ways to help mitigate serious abuses, and identify the bad actors early on, and prevent their operation within the community.

While I applaud Facebook’s response, I’m guessing they could have taken more action earlier on. Their response is more about damage control to their reputation, after the fact, than it is about preventing the problem from happening. Facebook most likely had plenty of warning signs regarding what Aleksandr Kogan, Strategic Communication Laboratories (SCL), including their political data analytics firm, Cambridge Analytica, were up to. If they weren’t than that is a problem in itself, and Facebook should be investing in more policing of their API consumers activity, as they claim they are doing in their release.

If Aleksandr Kogan has that many OAuth tokens for Facebook users, then Facebook should be up in his business, better understanding what he is doing, where his money comes from, and who is partners are. I’m guessing Facebook probably had more knowledge, but because it drove traffic, generated ad revenue, and was in alignment with their business model, it wasn’t a problem. They were willing to look the other way with the data sharing that was occurring, until it became a wider problem for the election, our democracy, and in the press. Facebook should have more awareness, oversight, and enforcement at the API management layer of their platform.

This situation I think highlights another problem of doing APIs, and ensuring API consumers are behaving appropriately with the data, content, and algorithms they are accessing. It can be tough to police what a developer does with data once they’ve pulled from an API. Where they store it, and who they share it with. You just can’t trust that all developers will have the platform, and it’s end-users best interest in mind. Once the data has left the nest, you really don’t have much control over what happens with it. There are ways you can identify unhealthy patterns of consumption via the API management layer, but Aleksandr Kogan’s quizzes probably would appear as a normal application pattern, with no clear signs of the relationships, and data sharing going on behind the scenes.

While I sympathize with Facebook’s struggle to police what people do with their data, I also know they haven’t invested in API management as much as they should have, and they are more than willing to overlook bad behavior when it supports their bottom line. The culture of the tech space supports and incentivizes this type of bad behavior from platforms, as well as consumers like Cambridge Analytica. This is something that regulations like GDPR out of the EU is looking to correct, but the culture in the United States is all about exploitation at this level, that is until it becomes front page news, then of course you act concerned, and begin acting accordingly. The app, big data, and API economy runs on the generating, consuming, buying, and selling of people’s data, and this type of practice isn’t going to go away anytime soon.

As Facebook states, they are taking measures to reign in bad actors in their developer community by being more strict in their application review process. I agree, a healthy application review process is an important aspect of API management. However, this does not address the regular review of applications usage at the API management level, assessing their consumption as they accumulate access tokens, to more user’s data, and go viral. I’d like to have more visibility into how Facebook will be regularly reviewing, assessing, and auditing applications. I’d even go so far as requiring more observability into ALL applications that are using the Facebook API, providing a community directory that will encourage transparency around what people are building. I know that sounds crazy from a platform perspective, but it isn’t, and would actually force Facebook to know their customers.

If platforms truly want to address this problem they will embrace more observability around what is happening in their API communities. They would allow certified and verified researchers and auditors to get at application level consumption data available at the API management layer. I’m sorry y’all, self-regulation isn’t going to cut it here. We need independent 3rd party access at the platform API level to better understand what is happening, otherwise we’ll only see platform action after problems occur, and when major news stories are published. This is the beauty / ugliness of APIs. The cats out of the bag, and platforms need them to innovate, deliver resources to web, mobile, and device applications, as well as remain competitive. APIs also provide the opportunity to peek behind the curtain, and better understand what is happening, and profile the good and the bad actors within each ecosystem–let’s take advantage of the good here, to help regulate the bad.


Facebook Cambridge Analytica And Knowing What Api Consumers Are Doing With Our Data

404: Not Found


Google Releases a Protocol Buffer Implementation of the Fast Healthcare Interoperability Resources (FHIR) Standard

Google is renewing its interest in the healthcare space by releasing a protocol buffer implementation of the fast healthcare interoperability resources (FHIR) standard. Protocol buffers are “Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler”. Its the core of the next generation of APIs at Google, often using HTTP/2 as a transport, while also living side by side with RESTful APIs, which use OpenAPI as the definition, in parallel to what protocol buffers deliver.

It’s a smart move by Google. Providing a gateway for healthcare data to find its way to their data platform products like Google Cloud BigQuery, and their machine learning solutions built on Tensorflow. They want to empower healthcare providers with powerful solutions that help onboard their data, and be able to connect the dots, and make sense of it at scale. However, I wouldn’t stop with protocol buffers. I would also make sure they also invest in API infrastructure on the RESTful side of the equation, developing OpenAPI specs alongside the protocol buffers, and providing translation between, and tooling for both realms.

While I am a big supporter of gRPC, and protocol buffers, I’m skeptical of the complexity it brings, in exchange for higher performance. Part of making sense of health care data will require not just technical folks being able to make sense of what is going on, but also business folks, and protocol buffers, and gRPC solutions will be out of reach of these users. Web APIs, combined with YAML OpenAPI has begun to make the API contracts involved in all of this much more accessible to business users, putting everything within their reach. In our technological push forward, let’s not forget the simplicity of web APIs, and exclude business healthcare users as IT has done historically.

I’m happy to see more FHIR-compliant APIs emerging on the landscape. PSD2 for banking, and FHIR for healthcare are the two best examples we have of industry specific API standards. So it is important that the definitions proliferate, and the services and tooling emerge and evolve. I’m hoping we see even more movement on this front in 2018, but I have to say I’m skeptical of Google’s role, as they’ve come and gone within this arena before, and are exceptional at making sure all roads lead to their solutions, without many roads leading back to enrich the data owners, and stewards. If we can keep API definitions, simple, accessible, and usable by everyone, not just developers and IT folks, we can help empower healthcare data practitioners, and not limit, or restrict them, when it is most important.


Your Obsessive Focus On The API Resource Is Hindering Meaningful Events From Happening

I’ve been profiling a number of market data APIs as part of my work with Streamdata.io to identify valuable sources of data that could be streamed using their service. A significant portion of the APIs I come across are making it difficult for me to get at the data they have because of their views around the value of the data, intellectual property, and maintaining control over it in an API-driven world. These APIs don’t end up on the list of APIs I’m including in the profiling work, the gallery / directory, and don’t get included in any of the stories I’m telling, as a result of this tight control.

The side effect of this is I end up getting repeated sales emails and phone calls asking if I am still interested in their data. If there was just one or two of these, I’d jump on phones and explain, but because I’m dealing with 50+ of them, I just don’t have the bandwidth, and I have to move on. The thing is, I’m personally not interested in their data. I’m interested in other people being interested in their data, and being an enabler to helping them to get at it. However, since I can’t actually profile the APIs, create OpenAPI definitions for the request and response structure for inclusion in the API gallery / directory I’m building, I really don’t need their APIs in my work.

I know these platforms are protective of their data because it is valuable. They should be. However modern API management allows for them to open up the sampling of everything they have to offer, without giving away the farm. This allows enablers, analysts, and storytellers like me to test drive things, profile what they have to offer, and include within our applications. Then my users find what they want, head over to the source of the data, sign up for API keys, talk to their sales staff about what data sets they are interested in. I’m just a middle man, a broker, someone who is looking to enable engagements with their data. I’m only interested in the data, because I understand it is valuable to others, not because I am personally interested in doing anything with it.

This reality is common amongst data brokers who live in the pre-API era. They don’t understand API management, and they don’t understand how innovation using APIs work. They still rely on a pretty closed, tight-gripped approach to selling data. API enablers like me don’t have the time to mess around in these worlds. There are too many APIs out there to waste our time. I’m looking to profile the best quality market data source, that are frictionless to get up and running with. I’m not looking for free data, I understand it costs to get at. I just want to send leads their way, but I need to be able to profile what you have to offer in detailed, in a machine readable way. In the end, it doesn’t matter, because these providers won’t be around for long. Other, more API-savvy data providers will emerge, and run them out of business–it is the circle of API life.


Five APIs to Guide You on Your Way to the Data Dark Side

I was integrating with the Clearbit API, doing some enrichment of the API providers I track on, and I found their API stack pretty interesting. I’m just using the enrichment API, which allows me to pass it a URL, and it gives me back a bunch of intelligence on the organization behind. I’ve added a bookmarklet to my browser, which allows me to push it, and the enriched data goes directly into my CRM system. Delivering what it the title says it does–enrichment.

Next up, I’m going to be using the Clearbit Discovery API to find some potentially new companies who are doing APIs in specific industries. As I head over the to the docs for the API, I notice the other three APIs, and I feel like they reflect the five stages of transition to the data intelligence dark side.

  • Enrichment API - The Enrichment API lets you look up person and company data based on an email or domain. For example, you could retrieve a person’s name, location and social handles from an email. Or you could lookup a company’s location, headcount or logo based on their domain name.
  • Discovery API - The Discovery API lets you search for companies via specific criteria. For example, you could search for all companies with a specific funding, that use a certain technology, or that are similar to your existing customers.
  • Prospector API - The Prospector API lets you fetch contacts and emails associated with a company, employment role, seniority, and job title.
  • Risk API - The Risk API takes an email and IP and calculates an associated risk score. This is especially useful for figuring out whether incoming signups to your service are spam or legitimate, or whether a payment has a high chargeback risk.
  • Reveal API - Reveal API takes an IP address, and returns the company associated with that IP. This is especially useful for de-anonymizing traffic on your website, analytics, and customizing landing pages for specific company verticals.

Your journey to the dark side begins innocently enough. You just want to know more about a handful of companies, and the data provided is a real time saver! Then you begin discovering new things, finding some amazing new companies, products, services, and insights. You are addicted. You begin prospecting full time, and actively working to find your latest fix. Then you begin to get paranoid, worried you can’t trust anyone. I mean, if everyone is behaving like you, then you have to be on your guard. That visitor to your website might be your competitor, or worse! Who is it? I need to know everyone who comes to my site. Then in the darkest depths of your binges you are using the reveal API and surveilling all your users. You’ve crossed to the dark side. Your journey is complete.

Remember kids, this is all a very slippery slope. With great power comes great responsibility. One day you are a scrappy little startup, and the next your the fucking NSA. In all seriousness. I think their data intelligence stack is interesting. I do use the enrichment API, and will be using the discovery API. However, we do have to ask ourselves, do we want to be surveilling all our users and visitors. Do we want to be surveilled on every site we visit, and on every application we use? At some point we have to make sure and check how far towards the dark side we’ve gone, and ask ourselves, is this all really worth it?

P.S. This story reminds me I totally flaked on delivering a white paper to Clearbit on the topic of risk. Last year was difficult for me, and I got swamped….sorry guys. Maybe I’ll pick up the topic and send something your way. It is an interesting one, and I hope to have time at some point.


How Do We Keep Teams From Being Defensive With Resources When Doing APIs?

I was talking with the IRS about their internal API strategy before Christmas, reviewing the teams current proposal before they pitched in across teams. One of the topics that came up, which I thought was interesting, was about how to prevent some teams from taking up a defensive stances around their resources when you are trying to level the playing field across groups using APIs and microservices. They had expressed concern that some groups just didn’t see APIs as a benefit, and in some cases perceived them as a threat to their current position within the agency.

This is something I see at almost EVERY SINGLE organization I work with. Most technical groups who have established control over some valuable data, content, or other digital resource, have entrenched themselves, and become resistant to change. Often times these teams have a financial incentive to remain entrenched, and see API efforts as a threat to their budget and long term viability. This type of politics within large companies, organizations, institutions, and government agencies is the biggest threat to change than technology ever is.

So, what can you do about it. Well, the most obvious thing is you can get leadership on your team, and get them to mandate change. Often times this will involve personnel change, and can get pretty ugly in the end. Alternately, I recommend trying to build bridges, by understanding the team in question, and find ways you can do API things that might benefit them. Maybe more revenue and budget opportunities. Reuse of code through open source, or reusable code and applications that might benefit their operations. I recommend mapping out the groups structure and needs, and put together a robust plan regarding how you can make inroads, build relationships, and potentially change behavior, instead of taking an adversarial tone.

Another way forward is to ignore them. Focus on other teams. Find success. Demonstrate what APIs can do, and make the more entrenched team come to you. Of course, this depends on the type of resources they have. Depending on the situation, you may or may not be able to ignore them completely. Leading by example is the best way to take down entrenched groups. Get them to come out of their entrenched positions, and lower their walls a little bit, rather than trying to breach them. You are better off focusing doing APIs and investing in moving forward, rather than battling with groups who don’t see the benefits. I guarantee they can last longer than you probably think, and have developed some pretty crafty ways of staying in control over the years.

Anytime I encounter entrenched team stories within organizations I get sad for anyone who has to deal with these situations. I’ve had some pretty big battles over my career, which ended up in me leaving good jobs, so I don’t take them lightly. However, it makes me smile a little to hear one out of the IRS, especially internally. I know plenty of human beings who work at the IRS, but with their hard-ass reputation they have from the outside, you can’t help but smile just a bit thinking them facing the same challenges that the rest of us do. ;-) I think this is one of the most important lessons of microservices, and APIs, is that we can’t let teams ever get this big and entrenched again. Once we decoupled, let’s keep things in small enough teams, that this type of power can’t aggregate and be to big to evolve again.


The Data Behind The Washington Post Story On Police Shootings in 2017

I was getting ready to write my usual, “wish there was actual data behind this story about a database” story, while reading the Fatal Force story in the Washington Post, and then I saw the link! Fatal Force, 987 people have been shot and killed by police in 2017. Read about our methodology. Download the data. I am so very happy to see this. An actual prominent link to a machine readable version of the data, published on Github–this should be the default for ALL data journalism in this era.

I see story after story reference the data behind, without providing any links to the data. As a database professional this practice drives me insane. Every single story that provides data driven visualizations, statistics, analysis, tables, or any other derivative from data journalism, should provide a link to the Github repository which contains at least CSV representations of the data, if not JSON. This is the minimum for ALL data journalism going forward. If you do not meet this bar, your work should be in question. Other analysts, researchers, and journalists should be able to come in behind your work and audit, verify, validate, and even build upon and augment your work, for it to be considered relevant in this time period.

Github is free. Google Sheets is free. There is no excuse for you not to be publishing the data behind your work in a machine readable format. It makes me happy to see the Washington Post using Github like this, especially when they do not have an active API or developer program. I’m going to spend some time looking through the other repositories in their Github organization, and also begin tracking on which news agencies are actively using Github. Hopefully, in the near future, I can stop ranting about reputable news outlets not sharing their data behind stories in machine readable formats, because the rest of the industry will help police this, and only the real data-driven journalists will be left. #ShowYourWork


Breaking Down The Value Of Real Time Apis

404: Not Found


Streaming Data From The Google Sheet JSON API And Streamdata.io

I am playing with Streamdata.io as I learn how to use my new partner’s service. Streamdata.io proxies any API, and uses Server-Sent Event (SSE) to push updates using JSON Patch. I am playing with making a variety of APIs real time using their service, and in my style, I wanted to share the story of what I’m working on, here on the blog. I was making updates to some data in a Google Sheet that I use to drive some data across a couple of my websites, and thought…can I make this spreadsheet streaming using Streamdata.io? Yes. Yes, I can.

To test out my theory I went and created a basic Google Sheet with two columns, one for product name, and one for price. Simulating a potential product pricing list that maybe I’d want to stream across multiple website, or possibly within client and partner portals. Then I published the Google Sheet to the web, making the data publicly available, so I didn’t have to deal with any sort of authentication–something you will only want to do with publicly available data. I’ll play around with an authenticated edition at some point in the future, showing more secure examples.

Once I made the sheet public I grabbed the unique key for the sheet, which you can find in the URL, and placed into this URL: https://spreadsheets.google.com/feeds/list/[sheet key]/od6/public/basic?alt=json. The Google Sheet key takes a little bit to identify in the URL, but it is the long GUID in the URL, which is the longest part of the URL when editing the sheet. Once you put the key in the URL, you can take the URL and paste in the browser–giving you a JSON representation of your sheet, instead of HTML, basically giving you a public API for your Google Sheet. The JSON for Google Sheets is a little verbose and complicated, but once you study a bit it doesn’t take long for it to come into focus, showing eaching of the columns and rows.

Next, I created a Streamdata.io account, verified my email, logged in and created a new app. Something that took me about 2 minutes. I take the new URL for my Google Sheet and publish as the target URL in my Streamdata.io account. The UI then generates a curl statement for calling the API through the Streamdata.io proxy. Before it will work, you will have to replace the second question mark with an ampersand (&), as Streamdata.io assumes you do not have any parameters in the URL. Once replaced, you can open up your command line, paste in the command and run. Using Server-Sent Event (SSE) you’ll see the script running, checking for changes. When you make any changes to your Google Sheet, you will see a JSON Patch response returned with any changes in real time. Providing a real-time stream of your Google Sheet which can be displayed in any application.

Next, I’m going to make a simple JavaScript web page that will take the results and render to the page, showing how to navigate the Google Sheets API response structure, as well as the JSON Patch using the Streamdata.io JavaScript SDK. All together this took me about 5 minutes to make happen, from creating the Google Sheet, to firing up a new Streamdata.io account, and executing the curl command. Sure, you’d still have to make it display anywhere, but it was quicker than I expected to make a Google Sheet real-time. I’ll spend a little more time thinking about the possibilities for using Google Sheets in this way, and publishing some UI examples to Github, providing a forkable use case that anyone can follow when making it all work for them.

Disclosure: Streamdata.io is an API Evangelist partner, and sponsors this site.


Being Able To See Your Database In XML, JSON, and CSV

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

I remember making the migration from XML to JSON. It was hard for me to understand that difference between the formats, and that you accomplish pretty much the same things in JSON that you could in XML. I’ve been seeing similarities in my migration to YAML from JSON. The parallels in each of these formats isn’t 100%, but this story is more about our perception of data formats, than it is about the technical details. CSV has long been a tool in my toolbox, but it was until this recent migration from JSON to YAML that I really started seeing the importance of CSV when it comes to helping onboard business users with the API possibilities.

In my experience API design plays a significant role in helping us understand our data. Half of this equation is understanding our schema, amd what the dimensions, field names, and data tpes of the data we are moving around using APIs. As I was working through some stories on how my friends over at SlashDB are turning databases into APIs, I saw that they were translating database, tables, and field names into API design, and that they also help you handle content negotiation between JSON, XML, CSV. Which I interpret as an excellent opportunity for learning more about the data we have in our databases, and getting to know the design aspects of the data schema.

In an earlier post about what SlashDB does I mentioned that many API designers cringe at translating database directly into a web API. While I agree that people should be investing into API design to get to know their data resources, the more time I spend with SlashDB’s approach to deploying APIs from a variety of databases, the more I see the potential for teaching API design skills along the way. I know many API developers who understand API design, but do not understand content negotiation between XML, JSON, and CSV. I see an opportunity for helping publish web APIs from a database, while having a conversation about what the API design should be, and also getting to know the underlying schema, then being able to actively negotiate between the different formats–all using an existing service.

While I want everyone to be as advanced as they possibly can with their API implementations, I also understand the reality on the ground at many organizations. I’m looking for any possible way to just get people doing APIs, and begin their journey, and I am not going to be to heavy handed when it comes to people being up to speed on modern API design concepts. The API journey is the perfect way to learn, and going from database to API, and kicking of the journey is more important than expecting everyone to be skilled from day one. This is why I’m partnering with companies like SlashDB, to help highlight tools that can help organizations take their existing legacy databases and translate them into web APIs, even if those APIs are just auto-translations of their database schema.

Being able to see your database as XML, JSON, and CSV is an important API literacy exercise for companies, organizations, institutions, and government agencies who are looking to make their data resources available to partners using the web. It is another important step in understanding what we have, and the naming and dimensions of what we are making available. I think the XML to JSON holds one particular set of lessons, but then CSV possesses a set of lessons all its own, helping keep the bar low for the average business user when it comes to making data available over the web. I’m feeling like there are a number of important lessons for companies looking to make their databases available via web APIs over at SlashDB, with automated XML, JSON, and CSV translation being just a noteable one.


Being Able To See Your Database In XML, JSON, and CSV

This is a sponsored post by my friends over at SlashDB. The topic is chosen by me, but the work is funded by SlasDB, making sure I keep doing what I do here at API Evangelist. Thank you SlashDB for your support, and helping me educate my readers about what is going on in the API space.

I remember making the migration from XML to JSON. It was hard for me to understand that difference between the formats, and that you accomplish pretty much the same things in JSON that you could in XML. I’ve been seeing similarities in my migration to YAML from JSON. The parallels in each of these formats isn’t 100%, but this story is more about our perception of data formats, than it is about the technical details. CSV has long been a tool in my toolbox, but it was until this recent migration from JSON to YAML that I really started seeing the importance of CSV when it comes to helping onboard business users with the API possibilities.

In my experience API design plays a significant role in helping us understand our data. Half of this equation is understanding our schema, and what the dimensions, field names, and data types of the data we are moving around using APIs. As I was working through some stories on how my friends over at SlashDB are turning databases into APIs, I saw that they were translating database, tables, and field names into API design, and that they also help you handle content negotiation between JSON, XML, CSV. Which I interpret as an excellent opportunity for learning more about the data we have in our databases, and getting to know the design aspects of the data schema.

In an earlier post about what SlashDB does I mentioned that many API designers cringe at translating database directly into a web API. While I agree that people should be investing into API design to get to know their data resources, the more time I spend with SlashDB’s approach to deploying APIs from a variety of databases, the more I see the potential for teaching API design skills along the way. I know many API developers who understand API design, but do not understand content negotiation between XML, JSON, and CSV. I see an opportunity for helping publish web APIs from a database, while having a conversation about what the API design should be, and also getting to know the underlying schema, then being able to actively negotiate between the different formats–all using an existing service.

While I want everyone to be as advanced as they possibly can with their API implementations, I also understand the reality on the ground at many organizations. I’m looking for any possible way to just get people doing APIs, and begin their journey, and I am not going to be to heavy handed when it comes to people being up to speed on modern API design concepts. The API journey is the perfect way to learn, and going from database to API, and kicking of the journey is more important than expecting everyone to be skilled from day one. This is why I’m partnering with companies like SlashDB, to help highlight tools that can help organizations take their existing legacy databases and translate them into web APIs, even if those APIs are just auto-translations of their database schema.

Being able to see your database as XML, JSON, and CSV is an important API literacy exercise for companies, organizations, institutions, and government agencies who are looking to make their data resources available to partners using the web. It is another important step in understanding what we have, and the naming and dimensions of what we are making available. I think the XML to JSON holds one particular set of lessons, but then CSV possesses a set of lessons all its own, helping keep the bar low for the average business user when it comes to making data available over the web. I’m feeling like there are a number of important lessons for companies looking to make their databases available via web APIs over at SlashDB, with automated XML, JSON, and CSV translation being just a notable one.


How Do You Ask Questions Of Data Using APIs?

I’m preparing to publish a bunch of transit related data as APIs, for us across a number of applications from visualizations to conversation interfaces like bots and voice-enablement. As I’m learning about the data, publishing it as unsophisticated CRUD APIs, I’m thinking deeply about how I would enable others to ask questions of this data using web APIs. I’m thinking about the hard work of deriving visual meaning from specific questions, all the way to how would you respond to an Alexa query regarding transit data in less than a second. Going well beyond what CRUD gives us when we publish our APIs and taking things to the next level.

Knowing the technology sector, the first response I’ll get is machine learning! You take all your data, and you train up some machine learning models, put some natural language process to work, and voila, you have your answer to how you provide answers. I think this is a sensible approach to many data sets, and for organizations who have the machine learning skills and resources at their disposal. There are also a growing number of SaaS solutions for helping put machine learning work to answer complex questions that might be asked of large databases. Machine learning is definitely part of the equation for me, but I’m not convinced it is the answer in all situations, and it might not always yield the correct answers we are always looking for.

After machine learning, and first on my list of solutions to this challenge is API design. How can I enable a domain expert to pull out the meaningful questions that will be asked of data, and expose as simple API paths, allowing consumers to easily get at the answers to questions. I’m a big fan of this approach because I feel like the chance we will get right answers to questions will be greater, and the APIs will help consumers understand what questions they might want to be asking, even when they are not domain experts. This approach might be more labor intensive than the magic of machine learning, but I feel like it will produce much higher quality results, and better serve the objectives I have for making data available for querying. Plus, this is a lower impact solution, allowing more people to implement, who might not have the machine learning skills or resources at their disposal. API design using low-cost web technology, makes for very accessible solutions.

Whether you go the machine learning or artisanal domain expert API design route, there has to be a feedback loop in place to help improve the questions being asked, as well as the answers being given. If there is no feedback loop, the process will never be improved. This is what APIs excel at when you do them properly. The savvy API platform providers have established feedback loops for API consumers, and their users to correct answers when they are wrong, learn how to ask new types of questions, and improve upon the entire question and answer life cycle. I don’t care whether you are going the machine learning route, or the API design route, you have to have a feedback loop in place to make this work as expected. Otherwise it is a closed loop system, and unlikely to give the answers people are looking for.

For now, I’m leaning heavily on the API design route to allow for my consumers to ask questions of the data I’m publishing as APIs. I’m convinced of my ability to ask some sensible questions of the data, and expose as simple URLs that anyone can query, and then evolve forward and improve upon as time passes. I just don’t have the time and resources to invest in the machine learning route at this point. As the leading machine learning platforms evolve, or as I generate more revenue to be able to invest in these solutions I may change my tune. However, for now I’ll just keep publishing data as simple web APIs, and crafting meaningful paths that allow people to ask questions of some of the data I’m coming across locked up in zip files, spreadsheets, and databases.


Generating Operational Revenue From Public Data Access Using API Management

This is part of some research I'm doing with Streamdata.io. We share a common interest around the accessibility of public data, so we thought it would be a good way for us to partner, and Streamdata.io to underwrite some of my work, while also getting the occasional lead from you, my reader. Thanks for supporting my work Streamdata.io, and thanks for support them readers!

A concept I have been championing over the years involves helping government agencies and other non-profit organizations generate revenue from public data. It is a quickly charged topic whenever brought up, as many open data and internet activists feel public data should remain freely accessible. Something I don’t entirely disagree with, but this is a conversation, that when approached right can actually help achieve the vision of open data, while also generating much needed revenue to ensure the data remains available, and even has the opportunity to improve in quality and impact over time.

Leveraging API Management I’d like to argue that APIs, and specifically API management has been well established in the private sector, and increasingly in the public sector, for making valuable data and content available online in a secure and measurable way. Companies like Amazon, Google, and even Twitter are using APIs to make data freely available, but through API management are limiting how much any single consumer can access, and even charging per API call to generate revenue from 3rd party developers and partners. This proven technique for making data and content accessible online using low-cost web technology, requiring all consumers to sign up for a unique set of keys, then rate limiting access, and establishing different levels of access tiers to identify and organize different types of consumers, can and should be applied in government agencies and non-profit organizations to make data accessible, while also asserting more control over how it is used.

Commercial Use of Public Data While this concept can apply to almost any type of data, for the purposes of this example, I am going to focus on 211 data, or the organizations, locations, and services offered by municipalities and non-profit organizations to hep increase access and awareness of health and human services. With 211 data it is obvious that you want this information to be freely available, and accessible by those who need it. However, there are plenty of commercial interests who are interested in this same data, and are using it to sell advertising against, or enrich other datasets, and products or services. There is not reason why cash strapped cities, and non-profit organizations carry the load to maintain, and serve up data for free, when the consumers are using it for commercial purposes. We do not freely give away physical public resources to commercial interests (well, ok, sometimes), without expecting something in return, why would we behave differently with our virtual public resources?

It Costs Money To Serve Public Data Providing access to public data online costs money. It takes money to run the database, servers, bandwidth, and websites and applicatiosn being used to serve up data. It takes money to clean the data, validate phone numbers, email addresses, and ensure the data is of a certain quality and brings value to end-users. Yes this data should be made freely available to those who need it. However, the non-profit organizations and government agencies who are stewards of the data shouldn’t be carrying the financial burden of this data remaining freely available to commercial entities who are looking to enrich their products and services, or simply generate advertising revenue from public data. As modern API providers have learned there are always a variety of API consumers, and I’m recommending that public data stewards begin leverage APIs, and API management to better understand who is accessing their data, and begin to put them into separate buckets, and understand who should be sharing the financial burden of providing public data.

Public Data Should Be Free To The Public If it is public data, it should be freely available to the public. One the web, and through the API. The average citizen should be able to come use human service websites to find services, as well as us the API to help them in their efforts to help others find services. As soon as any application of the public data moves into the commercial realm, and the storage, server, and bandwidth costs increase, they shouldn’t be able to offload the risk and costs to the platform, and be forced to help carry load when it comes to covering platform costs. API management is a great way to measure each application consumption, and then meter and quantify their role and impact, and either allow them to remain freely accessing information, or be forced to pay a fee for API access and consumption.

Ensuring Commercial Usage Helps Carry The Load Commercial API usage will have a distinctly different usage fingerprint than the average citizen, or smaller non-commercial application. API consumers can be asked to declare they application upon signing up for API access, as well as be identified throughout their consumption and traffic patterns. API management excels at metering and analyzing API traffic to understand where it is being applied, either on the web or in mobile, as well as in system to system, and other machine learning or big data analysis scenarios. Public data stewards should be in the business of requiring ALL API consumers sign up for a key which they include with each call, allowing the platform to identify and measure consumption in real-time, and on recurring basis.

API Plans & Access Tiers For Public Data Modern approaches to API management lean on the concept of plans or access tiers to segment out consumers of valuable resources. You see this present in software as a service (SaaS) offerings who often have starter, professional, and enterprise levels of access. Lower levels of the access plan might be free, or low cost, but as you ascend up the ladder, and engage with platforms at different levels, you pay different monthly, as well as usage costs. While also enjoying different levels of access, and loosened rate limits, depending on the plan you operate within. API plans allows platforms to target different types of consumers with different types of resources, and revenue levels. Something that should be adopted by public data stewards, helping establish common access levels that reflect their objectives, as well as is in alignment with a variety of API consumers.

Quantifying, Invoicing, And Understanding Consumption The private sector focuses on API management as a revenue generator. Each API call is identified and measured, grouping each API consumers usage by plan, and attaching a value to their access. It is common to charge API consumers for each API call they make, but there are a number of other ways to meter and charge for consumption. There is also the possibility of paying for usage on some APIs, where specific behavior is being encouraged. API calls, both reading and writing, can be operated like a credit system, accumulating credits, as well as the spending of credits, or translation of credits into currency, and back again. API management allows for the value generated, and extracted from public data resources is measured, quantified, and invoiced for even if money is never actually transacted. API management is often used to show the exchange of value between internal groups, partners, as well as with 3rd party public developers as we see commonly across the Internet today.

Sponsoring, Grants, And Continued Investment in Public Data Turning the open data conversation around using APIs, will open up direct revenue opportunities for agencies and organizations from charging for volume and commercial levels of access. It will also open up the discussion around other types of investment that can be made. Revenue generated from commercial use can go back into the platform itself, as well as funding different applications of the data–further benefitting the overall ecosystem. Platform partners can also be leveraged to join at specific sponsorship tiers where they aren’t necessarily metered for usage, but putting money on the table to fund access, research, and innovative uses of public data–going well beyond just “making money from public data”, as many open data advocates point out.

Alternative Types of API Consumers Discovering new applications, data sources, and partners is increasingly why companies, organizations, institutions, and government agencies are doing APIs in 2017. API portals are becoming external R&D labs for research, innovation, and development on top of digital resources being made available via APIs. Think of social science research that occurs on Twitter or Facebook, or entrepreneurs developing new machine learning tools for healthcare, or finance. Once data is available, identified as quality source of data, it will often be picked up by commercial interests building interesting things, but also university researchers, other government agencies, and potentially data journalists and scientists. This type of consumption can contribute directly to new revenue opportunities for organization around their valuable public data, but it can also provide more insight, tooling, and other contributions to a cities or organizations overall operations.

Helping Public Data Stewards Do What They Do Best I’m not proposing that all public data should be generating revenue using API management. I’m proposing that there is a lot of value in these public data assets being available, and a lot of this value is being extracted by commercial entities who might not be as invested in public data stewards long term viability. In an age where many businesses of all shapes and sizes are realizing the value of data, we should be helping our government agencies, and the not for profit organizations that serve the public good realize this as well. We should be helping them properly manage their digital data assets using APIs, and develop an awareness of who is consuming these resources, then develop partnerships, and new revenue opportunities along the way. I’m not proposing this happens behind closed doors, and I’m interested in things following an open API approach to providing observable, transparent access to public resources.

I want to see public data stewards be successful in what they do. The availability, quality, and access of public data across many business sectors is important to how the economy and our society works (or doesn’t). I’m suggesting that we leverage APIs, and API management to work better for everyone involved, not just generate more money. I’m looking to help government agencies, and non-profit organizations who work with public data understand the potential of APIs when it comes to access to public data. I’m also looking to help them understand modern API management practices so they can get better at identifying public data consumers, understanding how they are putting their valuable data to work, and develop ways in which they can partner, and invest together in the road map of public data resources. This isn’t a new concept, it is just one that the public sector needs to become more aware of, and begin to establish more models for how this can work across government and the public sector.


Generating Operational Revenue From Public Data Access Using Api Management

404: Not Found


The Tractor Beam Of The Database In An API World

<img src="https://s3.amazonaws.com/kinlane-productions/algo-rotoscope/stories-new/dragon-shadows-black-white-outline.jpg" align=="right" width="40%" style="padding: 15px;" />

I’m an old database person. I’ve been working with databases since my first job in 1987. Cobol. FoxPro. SQL Server. MySQL. I have had a production database in my charge accessible via the web since 1998. I understand how databases are the center of gravity when it comes to data. Something that hasn’t changed in an API driven world. This is something that will make microservices in a containerized landscape much harder than some developers will want to admit. The tractor beam of the database will not give up control to data so easily, either because of technical limitations, business constraints, or political gravity.

Databases are all about the storage and access to data. APIs are about access to data. Storage, and the control that surrounds it is what creates the tractor beam. Most of the reasons for control over the storage of data are not looking to do harm. Security. Privacy. Value. Quality. Availability. There are many reasons stewards of data want to control who can access data, and what they can do with it. However, once control over data is established, I find it often morphs and evolves in many ways, that can eventually become harmful to meaningful and beneficial access to data. Which is usually the goal behind doing APIs, but is often seen as a threat to the mission of data stewards, and results in a tractor beam that API related projects will find themselves caught up in, and difficult to ever break free of.

The most obvious representation of this tractor beam is that all data retrieved via an API usually comes from a central database. Also, all data generated or posted via an API, also ends up within a database. The central database always has an appetite for more data, whether scaled horizontally or vertically. Next, it is always difficult to break off subsets of data into separate API-driven project, or prevent newly established ones from being pulled in, and made part of existing database operations. Whether due to technical, business, or political reasons, many projects born outside this tractor beam will eventually be pulled into the orbit of legacy data operations. Keeping projects decoupled will always be difficult when your central databases has so much pull when it comes to how data is stored and accessed. This isn’t just a technical decoupling, this is a cultural one, that will be much more difficult to break from.

Honestly, if your database is over 2-3 years old, and enjoys any amount of complexity, budget scope, and dependency across your organization, I doubt you’ll ever be able to decouple it. I see folks creating these new data lakes, which act as reservoirs for any and all types of data gathered and generated across operations. These lakes provide valuable opportunities for API innovators to potentially develop new and interesting ways of putting data to work, if they possess an API layer. However, I still think the massive data warehouse and database will look to consume and integrated anything structured and meaningful that evolves on the shores. Industrial grade data operations will just industrialize any smaller utilities that emerge along the fringes of large organizations. Power structures have long developed around central data stores, and no amount of decoupling, decentralizing, or blockchaining will change this any time soon. You can see this with the cloud, which was meant to disrupt this, when it just moved it from your data center to the someone else’s, and allowed it to grow at a faster rate.

I feel like us API folks have been granted ODBC and JDBC leases for our API plantations, but rarely will we ever decouple ourselves from the mother ship. No matter what the technology whispers in our ears about what is possible, the business value, and political control over established databases will always dictate what is possible and what is not possible. I feel like this is one reason all the big database platforms have waited so long to provide native API features, and why next generation data streaming solutions rarely have simple, intuitive API layers. I think we will continue to see the tractor beam of database culture continue to be aggressive, as well as passive aggressive to anything API, trumping access possibilities brought to the table by APIs, with outdated power and control beliefs rooted in how we store and control our data. These folks rarely understand they can be just as controlling and greedy with APIs, but they seem to be unable to get over the promises of access APIs afford, and refuse to play along at all, when it comes to turning down the volume on the tractor beam so anything can flourish.


Using Apis To Enrich The Data You Have In Spreadsheets

404: Not Found


Provide An Open Source Threat Information Database And API Then Sell Premium Data Subscriptions

I was doing some API security research and stumbled across vFeed, a “Correlated Vulnerability and Threat Intelligence Database Wrapper”, providing a JSON API of vulnerabilities from the vFeed database. The approach is a Python API, and not a web API, but I think provides an interesting blueprint for open source APIs. What I found interesting (somewhat) from the vFeed approach was the fact they provide an open source API, and database, but if you want a production version of the database with all the threat intelligence you have to pay for it.

I would say their technical and business approach needs a significant amount of work, but I think there is a workable version of it in there. First, I would create a Python, PHP, Node.js, Java, Go, Ruby version of the API, making sure it is a web API. Next, remove the production restriction on the database, allowing anyone to deploy a working edition, just minus all the threat data. There is a lot of value in there being an open source set of threat intelligence sharing databases and API. Then after that, get smarter about having a variety different free and paid data subscriptions, not just a single database–leverage the API presence.

You could also get smarter about how the database and API enables companies to share their threat data, plugging it into a larger network, making some of it free, and some of it paid–with revenue share all around. There should be a suite of open source threat information sharing databases and APIs, and a federated network of API implementations. Complete with a wealth of open data for folks to tap into and learn from, but also with some revenue generating opportunities throughout the long tail, helping companies fund aspects of their API security operations. Budget shortfalls are a big contributor to security incidents, and some revenue generating activity would be positive.

So, not a perfect model, but enough food for thought to warrant a half-assed blog post like this. Smells like an opportunity for someone out there. Threat information sharing is just one dimension of my API security research where I’m looking to evolve the narrative around how APIs can contribute to security in general. However, there is also an opportunity for enabling the sharing of API related security information, using APIs. Maybe also generating of revenue along the way, helping feed the development of tooling like this, maybe funding individual implementations and threat information nodes, or possibly even fund more storytelling around the concept of API security as well. ;-)


Explore, Download, API, And Share Data

I’m regularly looking through API providers, service providers, and open data platforms looking for interesting ways in which folks are exposing APIs. I have written about Kentik exposing the API call behind each dashboard visualization for their networking solution, as well as CloudFlare providing an API link for each DNS tool available via their platform. All demonstrating healthy way we can show how APIs are right behind everything we do, and today’s example of how to provide API access is out of New York Open Data, providing access to 311 service requests made available via the Socrata platform.

The page I’m showcasing provides access 311 service requests from 2010 to present, with all the columns and meta data for the dataset, complete with a handy navigation toolbar that lets you view data in Carto or Plot.ly, download the full dataset, access via API, or simply share via Twitter, Facebook, or email. It is a pretty simple example of offering up multiple paths for data consumers to get what they want from a dataset. Not everyone is going to want the API. Depending on who you are you might go straight for the download, or opt to access via one of the visualization and charting tools. Depending on who you are targeting with your data, the list of tools might vary, but the NYC OpenData example via Socrata provides a nice example to build upon. With the most important message being do not provide only the options you would choose–get to know your consumers, and deliver solutions they will also need.

It provides a different approach to making APIs behind available to users than the Kentik or CloudFlare approaches do, but it adds to the number of examples I have to show people how APIs and API enabled integration can be exposed through the UI, helping educate the massess about what is possible. I could see standardized buttons, drop downs, and other embeddable tooling emerge for helping deliver solutions like this for providers. Something like we are seeing with the serverless webhooks out Auth0 Extensions. Some sort of API-enabled goodness that triggers something, and can be easily embedded directly into any existing web or mobile application, or possibly a browser toolbar–opening up API enabled solutions to the average user.

One of the reasons I keep showcasing examples like this is that I want to keep pushing back on the notion that APIs are just for developers. Simple, useful, and relevant APIs are not beyond what the average web application user can grasp. They should be present behind every action, visualization, and dataset made available online. When you provide useful integration and interoperability examples that make sense to the average user, and give them easy to engage buttons, drop downs, and workflows for implementing, more folks will experience the API potential in their world. The reasons us developers and IT folk keep things complex, and outside the realm of the normal folk is more about us, our power plays, as well as our inability to simplify things so that they are accessible beyond those in the club.


Big Data Is Not About Access Using Web APIs

I’m neck deep in research around data and APIs right now, and after looking at 37 of the Apache data projects it is pretty clear that web APIs are not a priority in this world. There are some of the projects that have web APIs, and there a couple projects that look to bridge several of the projects with an aggregate or gateway API, but you can tell that the engineers behind the majority of these open source projects are not concerned with access at this level. Many engineers will counter this point by saying that web APIs can’t handle the volume, and it shows that the concept isn’t applicable in all scenarios. I’m not saying web APIs should be used for the core functionality at scale, I’m saying that web APIs should be present to provide access to the result state of the core features for each of these platform, whatever that is, which something that web APIs excel at.

From my vantage point the lack of web APIs isn’t a technical one, it is a business and political motivation. When it comes to big data the objectives are always about access, and it definitely isn’t about the wide audience access that comes when you use HTTP, and the web for API access. The objective is to aggregate, move around, and work with as much data as you possibly can amongst a core group of knowledgable developers. Then you distribute awareness, access, and usage to designated parties via distilled analysis, visualizations, or in some cases to other systems where the result can be accessed and put to use. Wide access to this data is not the primary objective, paying forward much of the power and control we currently see around database to API efforts. Big data isn’t about democratization. Big Data is about aggregating as much as you can and selling the distilled down wisdom from analysis, or derived as part of machine learning efforts.

I am not saying there is some grand conspiracy here. It just isn’t the objective of big data folks. They have their marching orders, and the technology they develop reflect these marching orders. It reflects the influence money and investment has on the technology. The ideology that drives how the tech is engineered, and the algorithms handle specific inputs, and provide intended outputs. Big data is often sold as data liberation, democratization, and access to your data, building on much of what APIs have done in recent years. However, in the last couple of years the investment model has shifted, the clients who are purchasing and implementing big data have evolved, and they aren’t your API access type of people. They don’t see wide access to data as a priority. You are either in the club, and know how to use the Apache X technology, or you are sanctioned one of the dashboard analysis visualization machine learning wisdom drips from the big data. Reaching a wide audience is not necessary.

For me, this isn’t some amazing revelation. It is just watching power do what power does in the technology space. Us engineers like to think we have control over where technology goes, yet we are just cogs in the larger business wheel. We program the technology to do exactly what we are paid to do. We don’t craft liberating technology, or the best performing technology. We assume engineer roles, with paychecks, and bosses who tell us what we should be building. This is how web APIs will fail. This is how web APIs will be rendered yesterdays technology. Not because they fail technically, it is because the ideology of the hedge funds, enterprise groups, and surveillance capitalism organizations that are selling to law enforcement and the government will stop funding data systems that require wide access. The engineers will go along with it because it will be real time, evented, complex, and satisfying to engineer in our isolated development environments (IDE). I’ve been doing data since the 1980s, and in my experience this is how data works. Data is widely seen as power, and all the technical elements, and many of the human elements involved often magically align themselves in service of this power, whether they realize they are doing it or not.


APIs Used To Give Us Access To Resources That Were Out Of Our Reach

I remember when almost all the APIs out there gave us developers access to things we couldn’t ever possibly get on our own. Some of it was about the network effect with the early Amazon and eBay marketplaces, or Flickr and Delicious, and then Twitter and Facebook. Then what really brought it home was going beyond the network effect, and delivering resources that were completely out of our reach like maps of the world around us, (seemingly) infinitely scalable compute and storage, SMS, and credit card payments. In the early days it really seemed like APIs were all about giving us access to something that was out of our reach as startups, or individuals.

While this still does exist, it seems like many APIs have flipped the table and it is all about giving them access to our personal and business data in ways that used to be out of their reach. Machine learning APIs are using parlour tricks to get access to our internal systems and databases. Voice enablement, entertainment, and cameras are gaining access to our homes, what we watch and listen to, and are able to look into the dark corners of our personal lives. Tinder, Facebook, and other platforms know our deep dark secrets, our personal thoughts, and have access to our email and intimate conversations. The API promise seems to have changed along the way, and stopped being about giving us access, and is now about giving them access.

I know it has always been about money, but the early vision of APIs seemed more honest. It seemed more about selling a product or service that people needed, and was more straight up. Now it just seems like APIs are invasive. Being used to infiltrate our professional and business worlds through our mobile phones. It feels like people just want access to us, purely so they can mine us and make more money. You just don’t see many Flickrs, Google Maps, or Amazon EC2s anymore. The new features in mobile devices we carry around, and the ones we install in our home don’t really benefit us in new and amazing ways. They seem to offer just enough to get us to adopt them, and install in our life, so they can get access to yet another data point. Maybe it is just because everything has been done, or maybe it is because it has all been taken over by the money people, looking for the next big thing (for them).

Oh no! Kin is ranting again. No, I’m not. I’m actually feeling pretty grounded in my writing lately, I’m just finding it takes a lot more work to find interesting APIs. I have to sift through many more emails from folks telling me about their exploitative API, before I come across something interesting. I go through 30 vulnerabilities posts in my feeds, before I come across one creative story about something platform is doing. There are 55 posts about ICOs, before I find an interesting investment in a startup doing something that matters. I’m willing to admit that I’m a grumpy API Evangelist most of the time, but I feel really happy, content, and enjoying my research overall. I just feel like the space has lost its way with this big data thing, and are using APIs to become more about infiltrating and extraction, that it is about delivering something that actually gives developers access to something meaningful. I just think we can do better. Something has to give, or this won’t continue to be sustainable much longer.


If you think there is a link I should have listed here feel free to tweet it at me, or submit as a Github issue. Even though I do this full time, I'm still a one person show, and I miss quite a bit, and depend on my network to help me know what is going on.