OVER CENTRALIZING DATA SCIENCE

“Over Centralizing” Data Science: A Clear and Present Danger to U.S. Defense Dominance

October 16, 2020

Artificial Intelligence (AI) and Machine Learning (ML) defense applications are touted as “the future” of our national defense. The harsh reality is operational data science and its AI and ML applications have been here for a decade, demonstrably critical to present-day national defense. Harsher yet, the United States is behind. Russia and China are outpacing the U.S. in the development and deployment of these defense applications and the gap continues to grow exponentially wider.

Why? Do our near-peer adversaries have more resources and the ability to develop and deploy these applications.? No.

The United States maintains advantages in basic research and educational institutions, [1] small business innovation, and “startups,” [2] and capital markets to scale and grow successful companies. [3] These advantages have not been exploited to maximal effect because the core DOD business model continues to be dominated by large, centralized efforts. The commercial sector has recognized centralized efforts as a largely obsolete business model for AI companies, [4] and that, unlike traditional software-as-a-service companies, the effective model for an AI/ML company is done in a tightly coupled, customer-centric model that blends material solutions (products) with custom development and integration (services). Our near-peer adversaries have taken this cue from U.S. commercial innovators and are applying weighted decentralization throughout their military complex.

To grow U.S. national security AI dominance, Congress should encourage DOD policies and oversight that drives resources towards the democratization of AI development and deployment, ensure the Federal Acquisition Regulations (FAR) protects the government’s access to data, as well as further reform the security clearance process for the DoD to better leverage non-traditional vendors.

Some within the DOD have adopted this decentralized operational data science model, [5] and have seen staggering success. One example is the Tactical Data Teams (TDTs), initiated in 2018 by US Army Special Operations Command. In just the last year, these TDTs delivered 20 discrete AI/ML solutions, deployed by and used by warfighters, and a total cost of less than $3mm. Multiple TDT solutions have been handed to other DoD organizations, including the Joint Artificial Intelligence Center (JAIC), for scaling and transition. In short, the democratization of AI/ML has delivered more capability to the battlefield at a fraction of the time or cost of large top-down programs. These teams have integrated best-in-class talent from the pool of startups, government civilians, and the uniformed military. The centralized model seems to reward success metrics that end with we “developed and deployed the product” vs the decentralized success metrics that include the actual “use, scaled adoption, and lasting evolution” by those closest to it. To scale and grow this model, three things are needed:

Resources for decentralized AI/ML development and deployment. As an applied craft, operational data science is honed by repetition and proximity. Repetition: not just a volume of work, but also the process of taking smaller projects and products and iteratively integrating them into larger and more complex workflows. Proximity: data science is less effective when practiced in a vacuum. The operational context can only be captured close to the problem set and is critical for relevant solutions. The TDT “business model” leverages repetition and proximity to deliver maximum impact in minimal time and expense. To ensure innovation critical to AI dominance, centralized programs should focus more on oversight and scaling of decentralized development/deployment/innovation, and less on the actual development. [6]

Congress should direct AI development and operational resources to be pushed to do the Division echelon. For roughly $100mm, the DoD could ‘push down’ $3mm to ~30 different O-6 commands to be implemented on operational AI against the unit commander’s priorities.
Data regulations in the FAR. The lifeblood of AI/ML applications is data. [7] It is a strategic asset, like fuel or ammunition, critical, but ultimately worthless if it cannot be accessed at the time and place of need. [8] Government data should be government-owned, not vendor-controlled. The services branches are in parallel pursuing initiatives to make existing data more accessible. [9] However, contracting procedures around data access are not standardized. Consequently, examples proliferate of government data being “held captive” by vendors.

Looking forward, at the level of the FAR, contract awards that generate or ingest data must have formalized access requirements (e.g. via API) with audit and penalties for vendor non-compliance that include compelling vendors to provide streamlined unlimited government access to data and/or contract termination under FAR 12.403.
Reform to the security clearance process can catalyze innovation. Despite bold changes and improvement, the security clearance process remains a barrier to America’s non-traditional vendor/startup commercial base. While commendable gains have been made in individual security clearance adjudication, the problem of Facilities Clearance (FCL)/contract eligibility or holding of clearances persists. There is a fundamental “chicken-or-the-egg” problem. One cannot apply for an FCL or hold clearances without a contract award requiring cleared work. However, holding an FCL and actively cleared personnel is typically the prerequisite for the contract award. This ecosystem does not benefit our nation.

Congress should direct reform to this process. Companies should be invited to apply for and maintain FCLs, provided they bear the cost of the process, without a current procurement requirement. This will allow appropriately vetted firms and personnel a formalized and controlled path to compliance. Further, this will improve the security around our nation’s critical secrets by removing the incentive for “favor-trading” and the other ad-hoc processes that are rife today. Finally, Other Transactional Authorities (OTAs) and other “innovation-forward” acquisition processes should endeavor to include more clearance billets to allow awarded nontraditional innovators timely proximity to critical problem sets.

China plans to be an AI world leader by 2030. [10] Their increasingly aggressive actions imply that the imbalance between Chinese and U.S. AI capability could be cataclysmic. And yet despite comparable AI investment and a substantial technological lead, the US is falling behind. If we ignore trends in the commercial AI industry and the successes of decentralized DoD AI development, we risk not only our dominant technology position but also jeopardize maintaining the fragile defense capability parity with near-peer adversaries.

Anthony Saffier is VP of Product at Striveworks, a leading-edge, data science company based in Austin, TX. Previous Saffier applied his AI expertise across a number of commercial sectors. Saffier spent over a decade in Special Operations Forces for the U.S. military and is a leading expert on how operational data science can be a force-multiplying tool for the warfighter. He holds an MBA from the Massachusetts Institute of Technology and a bachelor’s degree in Finance from Tulane University.

Charles Segars is a management consultant advising companies in the national security, technology, media, and telecom sectors. He is a board advisor to several defense start-ups and is a founder of Segars Media a content, branding, and communications firm based in Los Angeles.

“Over Centralizing” Data Science: A Clear and Present Danger to U.S. Defense Dominance

​