<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>Stefano Maffulli &#8211; Open Source Initiative</title>
	<atom:link href="https://opensource.org/blog/author/opensource_kvv3zd/feed" rel="self" type="application/rss+xml" />
	<link>https://opensource.org</link>
	<description>The steward of the Open Source Definition, setting the foundation for the Open Source Software ecosystem.</description>
	<lastBuildDate>Thu, 25 Jul 2024 11:41:18 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://i0.wp.com/opensource.org/wp-content/uploads/2023/01/cropped-cropped-OSI_Horizontal_Logo_0-e1674081292667.png?fit=32%2C32&#038;ssl=1</url>
	<title>Stefano Maffulli &#8211; Open Source Initiative</title>
	<link>https://opensource.org</link>
	<width>32</width>
	<height>32</height>
</image> 
<atom:link rel="hub" href="https://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="https://pubsubhubbub.superfeedr.com"/><atom:link rel="hub" href="https://websubhub.com/hub"/><site xmlns="com-wordpress:feed-additions:1">210318891</site>	<item>
		<title>OSI at the United Nations OSPOs for Good</title>
		<link>https://opensource.org/blog/osi-at-the-united-nations-ospos-for-good</link>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Wed, 24 Jul 2024 22:57:06 +0000</pubDate>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=64994</guid>

					<description><![CDATA[Earlier this month the Open Source Initiative participated in the “OSPOs for Good” event promoted by the United Nations in NYC.]]></description>
										<content:encoded><![CDATA[
<p>Earlier this month the Open Source Initiative participated in the “<a href="https://www.un.org/techenvoy/fr/content/ospos-good-2024">OSPOs for Good</a>” event promoted by the United Nations in NYC. Stefano Maffulli, the Executive Director of the OSI, participated in a panel moderated by Mehdi Snene about Open Source AI alongside distinguished speakers Ashley Kramer, Craig Ramlal,  Sasha Luccioni, and Sergio Gago. Please find below a transcript of Stefano’s presentation.</p>



<p><strong>Mehdi Snene </strong>&nbsp;</p>



<p>What is Open Source in AI? What does it mean? What are the foundational pieces? How far along is the data? There is mention of weights, and data skills. How can we truly understand what Open Source in AI is? Today, joining us, we&#8217;ll have someone who can help us understand what Open Source in AI means and where we are heading. Stefano, can you offer your insights?</p>



<p><strong>Stefano Maffulli&nbsp;&nbsp;</strong></p>



<p>Thanks. We have some thoughts on this. We&#8217;ve been pondering these questions since they first emerged when GPT started to appear. We asked ourselves: How do we transfer the principles of permissionless innovation and the immense value created by the Open Source ecosystem into the AI space?</p>



<p>After a little over two years of research and global conversations with multiple stakeholders, we identified three key elements. Firstly, permissionless innovation needs to be ported to AI, but this is complex and must be broken down into smaller components.</p>



<p>We realized that, as developers, users, and deployers of AI systems, we need to understand how these systems are built. This involves studying all components carefully, being able to run them for any purpose without asking for permission (a basic tenet of Open Source), and modifying them to change outputs based on the same inputs. These basic principles include being able to share these modifications with others.</p>



<p>To achieve this, you need data, the code used for training and cleaning the data (e.g., removing duplicates), the parameters, the weights, and a way to run inference on those weights. It&#8217;s fairly straightforward. However, the challenge lies in the legal framework.</p>



<p>Now, the complicated piece is how Open Source software has had a very wonderful run, based on the fact that the legal framework that governs Open Source is fairly simple and globally accepted. It&#8217;s built on copyright, a system that has worked wonderfully in both ways. It gives exclusive rights to the content creators, but also the same mechanism can be used to grant rights to anyone who receives the creation.</p>



<p>With data, we don&#8217;t have that mechanism. That is a very simple and dramatic realization. When we talk about data, we should pay attention to what kind of data we&#8217;re discussing. There is data as content created, and there is data as facts; like fires, speed limits, or traces of a road. Those are facts, and they have different ways of being treated. There is also private data, personal information, and various other kinds of data, each with different rules and regulations around the world.</p>



<p>Governments&#8217; major role in the future will be to facilitate permissionless innovation in data by harmonizing these rules. This will level the playing field, where currently larger corporations have significantly more power than Open Source developers or those wishing to create large language models. Governments should help create datasets, remove barriers, and facilitate access for academia, smaller developers, and the global south.</p>



<p><strong>Mehdi Snene&nbsp;&nbsp;</strong></p>



<p>We already have open data and Open Source. Now, we need to create open AI and open models. Are we bringing these two domains together and keeping them separate, or are we creating something new from scratch when we talk about open AI?</p>



<p><strong>Stefano Maffulli&nbsp;</strong></p>



<p>This is a very interesting and powerful question. I believe that open data as a movement has been around for quite a while. However, it’s only recently that data scientists have truly realized the value they hold in their hands. Data is fungible and can be used to build new things that are completely different from their original domains.</p>



<p>We need to talk more about this and establish platforms for better interaction. One striking example is a popular dataset of images used for training many image generation AI tools, which contained child sexual abuse images for many years. A research paper highlighted this huge problem, but no one filed a bug report, and there was no easy way for the maintainers of this dataset to notice and remove those images.</p>



<p>There are things that the software world understands very well, and things that data scientists understand very well. We are starting to see the need for more space for interactions and learning from each other.</p>



<p>The conversation is extremely complicated. Alex and I have had long discussions about this. I don&#8217;t want to focus entirely on this, but I do want to say that Open Source has never been about pleasing companies or specific stakeholders. We need to think of it as an ecosystem where the balances of power are maintained.</p>



<p>While Open Source software and Open Source AI are still evolving, the necessary ingredients—data, code, and other components—are there. However, the data piece still needs to be debated and finalized. Pushing for radical openness with data has clear drawbacks and issues. It&#8217;s going to be a balance of intentions, aiming for the best outcome for the general public and the whole ecosystem.</p>



<p><strong>Mehdi Snene&nbsp;&nbsp;</strong></p>



<p>Thank you so much. My next question is about the future. What are your thoughts on the next big technology?</p>



<p><strong>Stefano Maffulli&nbsp;</strong></p>



<p>From the perspective of open innovation, it&#8217;s about what&#8217;s going to give society control over technology. The focus of Open Source has always been to enable developers and end-users to have sovereignty over the technology they use. Whether it&#8217;s quantum computers, AI, or future technologies, maintaining that control is crucial.</p>



<p>Governments need to play a role in enabling innovation and ensuring that no single power becomes too dominant. The balance between the private sector, public sector, nonprofit sector, and the often-overlooked fourth sector—which includes developers and creators who work for the public good rather than for profit—must be maintained. This balance is essential for fostering an ecosystem where all stakeholders have equal interests and influence.</p>



<p><br><em>If you would like to listen to the panel discussion in its entirety, you can do so <a href="https://webtv.un.org/en/asset/k1m/k1ma4k9rff">here</a></em> (the Open Source AI panel starts at 1:00:00 approximately).</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">64994</post-id>	</item>
		<item>
		<title>Explaining the concept of Data information</title>
		<link>https://opensource.org/blog/explaining-the-concept-of-data-information</link>
					<comments>https://opensource.org/blog/explaining-the-concept-of-data-information#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Fri, 14 Jun 2024 13:53:28 +0000</pubDate>
				<category><![CDATA[Opinions]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=44634</guid>

					<description><![CDATA[This post clarifies how the draft Open Source AI Definition arrived at its current state, the design principles behind the Data information concept and the constraints (legal and technical) it operates under.]]></description>
										<content:encoded><![CDATA[
<p>There seems to be some confusion caused by the concept of Data information included in the draft v0.0.8 of the Open Source AI Definition. Some readers may have seen the original dataset included in the list of optional components and quickly jumped to the wrong conclusions. This post clarifies how the draft arrived at its current state, the design principles behind the <em>Data information</em> concept and the constraints (legal and technical) it operates under.</p>



<h2 class="wp-block-heading">The objective of the Open Source AI Definition</h2>



<p>The objective of the Open Source AI Definition is to replicate in the context of artificial intelligence (AI) the principles of autonomy, transparency, frictionless reuse, and collaborative improvement for end users and developers of AI systems. These are described in the <a href="https://hackmd.io/@opensourceinitiative/osaid-0-0-8#Why-we-need-Open-Source-Artificial-Intelligence-AI">preamble</a>.</p>



<p>Following the preamble is the definition of Open Source AI, an adaptation of the definition of <a href="https://www.gnu.org/philosophy/free-sw.html">Free Software</a> (also known as “the four freedoms”) to AI nomenclature. The preamble and the four freedoms have been co-designed over several meetings and public discussions, online and in-person, and have not recently received significant comments.&nbsp;</p>



<p>The Free Software definition specifies that a precondition to the freedom to study and modify a program is to have access to the source code. Source code is defined as “the preferred form of the program for making changes in.” Draft v0.0.8 contains a description of what’s necessary to enjoy the freedoms to study and modify an AI system. This new section titled <a href="https://hackmd.io/@opensourceinitiative/osaid-0-0-8#Preferred-form-to-make-modifications-to-machine-learning-systems">Preferred form to make modifications to machine-learning systems</a> has generated a heated debate.&nbsp;</p>



<h1 class="wp-block-heading">What is the preferred form to make modifications</h1>



<p>The concept of “preferred form to make modifications” focuses on machine learning systems because these systems require data and training to produce a working system. Other AI systems are more easily classifiable as software and don’t require a special definition.&nbsp;</p>



<p>The <a href="https://discuss.opensource.org/t/report-on-working-group-recommendations/247">system analysis phase of the co-design process revealed</a> that studying and modifying machine learning systems requires data, code for training and inference and model parameters. For the parameters, there’s no ambiguity: an Open Source AI must make them available under terms that respect the Open Source principles (no field-of-use restrictions, no discrimination against people, etc). For the data and code requirements, the text in the “preferred form to make modifications” section is longer and harder to parse, generating some confusion.&nbsp;</p>



<p>The intent of the code and data requirements is to&nbsp; ensure that end users, deployers and developers of an Open Source AI system have all the tools and instructions to recreate that AI system from scratch, to satisfy the freedoms to study and modify the system. At a high-level view, it makes sense to suggest that training datasets should be mandatorily released with permissive licenses in order to be Open Source AI. </p>



<p>However on close examination, it became clear that sharing the original datasets is full of traps. It actually puts Open Source at a disadvantage compared to opaque and proprietary AI systems.</p>



<h2 class="wp-block-heading">The issue with data</h2>



<p>Data is not software: The legal landscape for data is much wider than copyright. Aggregating large datasets and distributing them internationally is an endless nightmare that includes privacy laws, copyright, sui-generis rights, patents, secrets and more. Without diving deeper into legal issues, let’s focus on practical examples to clarify why the distribution of the training dataset is not spelled out as a requirement in the concept of <em>Data information</em>.</p>



<ul class="wp-block-list">
<li><a href="https://pile.eleuther.ai/">The Pile</a>, the open dataset used to train the very open Pythia models, was taken down after an alleged copyright infringement, currently being litigated in the United States. However, the Pile appears to be legal to share in Japan. It’s also unclear whether it can be legally shared in the European Union.&nbsp;</li>



<li><a href="https://huggingface.co/datasets/allenai/dolma">DOLMA</a>, the open dataset used to train the very open OLMo models, was initially released with a restrictive license. It later <a href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/32?u=stefano">switched</a> to a permissive one. On further inspection, DOLMA appears to suffer from the <a href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/34?u=stefano">same legal uncertainties</a> of the Pile, however the Allen Institute has not been sued yet.</li>



<li>Training techniques that preserve privacy like federated learning don’t create datasets.&nbsp;</li>
</ul>



<p>All these cases show that requiring the original datasets creates vagueness and uncertainty in applying the Open Source AI Definition:</p>



<ul class="wp-block-list">
<li>If a dataset is only legal in Japan, is that AI Open Source only in Japan?</li>



<li>If a dataset is initially legally available but later retracted, does the AI go from being Open Source to not?
<ul class="wp-block-list">
<li>If so, what happens to the applications that use such AI?</li>
</ul>
</li>



<li>If no dataset is created, then will any AI trained with such techniques ever be Open Source?</li>
</ul>



<p>Additionally, there are reasons to believe that OpenAI, Anthropic and other proprietary systems have been trained on the same questionable data inside The Pile and DOLMA: Proving that’s the case is a lot harder and expensive though. This is clearly a disincentive to be open and transparent on the data sources, adding a burden to the organizations that try to do the right thing.</p>



<p>The solution to these questions, draft v0.0.8 contains the concept of <em>Data information</em>, coupled with code requirements to obtain the expected result: for end users, developers and deployers of AI systems to be able to reproduce an Open Source AI.</p>



<h1 class="wp-block-heading">Understanding the concept of Data Information</h1>



<p><em>Data information</em>, in the draft Open Source AI Definition, is defined as:&nbsp;</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Sufficiently detailed information about the data used to train the system, <strong>so that</strong> a skilled person can recreate a substantially equivalent system using the same or similar data.</p>
</blockquote>



<p>Read that from the end: The intention of <em>Data information</em> is to allow developers to <strong>recreate</strong> a substantially <strong>equivalent system</strong> using <strong>the same or similar data</strong>. That means that an Open Source AI must disclose all the ingredients, where they’ve been bought and all the instructions to prepare the dish.&nbsp;&nbsp;</p>



<p>This is a solution that <a href="https://discuss.opensource.org/t/report-on-working-group-recommendations/247">came out of the co-design process</a>, where reviewers didn’t rank the training datasets as high as they ranked the training code and data transparency requirements.&nbsp;</p>



<p><em>Data information</em> and the code requirements also address all of the questions around the legality of distributing data and datasets, or their absence.</p>



<p>If a dataset is only legal in Japan or becomes illegal later, one should still be able to <strong>recreate</strong> a dataset suitable to train an <strong>equivalent system</strong> replacing the illegal or unavailable pieces with similar ones.</p>



<p>AI systems trained with federated learning (where a dataset isn&#8217;t created) can still be Open Source AI if all instructions and code are released so that a new training with different data can generate an <strong>equivalent system</strong>.</p>



<p>The <em>Data information</em> concept also solves an example (raised on the forum) of an AI system trained on data licensed directly from Reddit. In this case, if the original developers released enough information to allow another AI developer to <strong>recreate</strong> a substantially <strong>equivalent system</strong> with Reddit data taken from an existing dataset, like CommonCrawl, it would be considered Open Source AI.</p>



<h2 class="wp-block-heading">The proposed alternatives</h2>



<p>While generally well received, draft v0.0.8 has been criticized by a few people on the forum for putting the training dataset in the “optional requirements”. Some suggestions and pushback we’ve received:</p>



<ul class="wp-block-list">
<li>Require the use of <a href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351">synthetic data</a> when the training dataset cannot be legally shared: This technique may work in some corner cases, if the technology evolves to be reliable enough. It’s expensive and untested at scale.</li>



<li>Classify as Open Source AI systems where <a href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/17#levels-of-openness-2">all their components are</a> “open source”: This approach is <strong>not</strong> rooted in the longstanding practice of the GNU project to accept system library exceptions and other compromises in exchange for more Open Source tools.</li>



<li>Datasets built by crawling the internet are the equivalent of theft, they shouldn’t be allowed&nbsp; at all, let alone allowed in Open Source AI: This pushback ignores the reality that large data aggregators already have acquired legally the rights to accumulate that same data (through scraping and terms of use) and are trading it, exclusively capturing the economic value of what should be in the commons. Read <a href="https://openfuture.eu/wp-content/uploads/2024/04/240404Towards_a_Books_Data_Commons_for_AI_Training.pdf">Towards a Books Data Commons for AI Training</a> for more details. There is no general agreement that text and data mining is equivalent to theft.</li>
</ul>



<p>These demands and suggestions are hard to accept. We need an Open Source AI Definition that can effectively guide users and developers to make the right choice. We need one that doesn’t put developers of Open Source AI at a disadvantage compared to proprietary ones. We need a Definition that contains positive examples from the start so we can practically demonstrate positive qualities to policymakers.&nbsp;</p>



<p>The discussion about data, how to generate incentives to create datasets that can be distributed internationally, safely, preserving privacy, is extremely complex. It can be addressed separately from the Open Source AI Definition. In collaboration with Open Future Foundation and others, OSI is designing a series of conferences to tackle the data governance issue. We’ll make an announcement soon.</p>



<h2 class="wp-block-heading">Have your say now</h2>



<p>The concept of Data information and code requirements is hard to grasp at first. But the<a href="https://discuss.opensource.org/t/initial-report-on-definition-validation/368/9"> preliminary results of the validation phase</a> confirm that the draft v0.0.8 works as expected: Pythia and OLMo both would be Open Source AI, while Falcon, Grok, Llama, Mistral would not (even if they used OSD-compatible licenses) because they don’t share <em>Data information</em>. BLOOM and StarCoder would fail because of field-of-use restrictions in their models. </p>



<p><em>Data information</em> can be improved but it’s better than other solutions proposed so far. As we get closer to the release of the stable version of the Open Source AI Definition, we need to hear from you: If you support this concept please comment on the forum today. If you don’t support it, please try to propose an alternative that at least covers the practical examples of Pile, DOLMA and federated learning above. Help the community move the conversation forward.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/explaining-the-concept-of-data-information/feed</wfw:commentRss>
			<slash:comments>22</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">44634</post-id>	</item>
		<item>
		<title>Contributions of Open Source to AI: a panel discussion at CPDP-ai conference</title>
		<link>https://opensource.org/blog/contributions-of-open-source-to-ai-a-panel-discussion-at-cpdp-ai-conference</link>
					<comments>https://opensource.org/blog/contributions-of-open-source-to-ai-a-panel-discussion-at-cpdp-ai-conference#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Tue, 04 Jun 2024 09:00:00 +0000</pubDate>
				<category><![CDATA[Opinions]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[europe]]></category>
		<category><![CDATA[policy]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=26305</guid>

					<description><![CDATA[Discussing the challenges of Open Source AI when it comes to data, hardware, big tech companies and government regulations as a panelist at the CPDP-ai conference in Brussels.]]></description>
										<content:encoded><![CDATA[
<p>I participated as a panelist at the <a href="https://www.cpdpconferences.org/" target="_blank" rel="noreferrer noopener">CPDP-ai 2024 conference</a> in Brussels last week where we discussed the significant contributions of Open Source to AI and highlighted the specific properties that differentiate Open Source AI from proprietary solutions. Representing the Open Source Initiative (OSI), the globally recognized non-profit that defines the term Open Source, I emphasized the longstanding principle of granting users full agency and control over technology, which has been proven to deliver extensive social benefits.</p>



<p>Below is a glimpse at the questions and answers posed to me and my fellow panelists:</p>



<p><em>Question: Stefano, please explain what the contribution to AI from Open Source is, and if there are specific properties of Open Source AI that make a difference for the users and for the people who are confronted with its results.</em></p>



<p>Response: The Definition of Open Source Software has existed for over 25 years; That doesn’t apply to AI. The Open Source Definition for software provides a stable north star for all participants in the digital ecosystem, from small and large companies to citizens and governments.</p>



<p>The basic principle of the Open Source Definition is to grant to the users of any technology full agency and control over the technology itself. This means that users of Open Source technologies have self-sovereignty of the technical solutions.</p>



<p>The Open Source Definition has demonstrated that massive social benefits accrue when you remove the barriers to learning, using, sharing and improving software systems. There is ample evidence that giving users agency, control and self-sovereignty of their technical choices produces a viable ecosystem based on permissionless innovation. Multiple studies by the EU Commission and Harvard researchers have assigned significant economic value to Open Source Software, all based on that single, clear, understood and approved Definition from 26 years ago.</p>



<p>For AI, and especially the most recent machine learning solutions, it’s less clear how society can maintain self-sovereignty of the technology and how to achieve permissionless innovation. Despite the fact that many people talk about Open Source AI, including the AI Act, there is no shared understanding of what that means, yet!</p>



<p>The Open Source Initiative is concluding a global, multi-stakeholder co-design process to find an unequivocal definition of Open Source AI, and we’re heading towards the conclusion of this process with a vastly increased knowledge of the AI machine learning space. <a href="https://go.opensource.org/osaid-latest" data-type="link" data-id="https://go.opensource.org/osaid-latest" target="_blank" rel="noreferrer noopener">The current draft of the Open Source AI Definition</a> recognizes that in order to study, use, share and modify AI, one needs to refer to an AI system, not a single individual component. The global process has identified the components required for society to maintain control of the technology and these are: </p>



<ul class="wp-block-list">
<li>Detailed information about the dataset used to train the system and the code so that a skilled person can train a system with similar capabilities</li>



<li>All the libraries and tools used to run training and inference</li>



<li>The model architecture and the parameters, like weights and biases</li>
</ul>



<p>Having unrestricted access to all these elements is what makes an AI an Open Source AI.</p>



<p>We’re in the final stretch of the process, starting to gather support for <a href="https://go.opensource.org/osaid-latest" data-type="link" data-id="https://go.opensource.org/osaid-latest" target="_blank" rel="noreferrer noopener">the current draft of the definition</a>.</p>



<p>The most controversial part of the discussion is the role of data in the training. To answer your question about the power of big foreign tech companies, putting aside the hardware requirements, the data is where the fight is. There seem to be two views of the world on data when it comes to AI: One thinks that text and data mining is basically strip mining humanity and all accumulation of data without consent of the rights holders must be made illegal. Another view of the world is that text and data mining for the purpose of training Open Source AI is probably the only antidote to the superpowers of large corporations. These camps haven’t found a common position yet. Japan seems to have made up its mind already, legalizing unrestricted text and data mining. We’ll see where the lawsuits in the US will go, if they ever get to a decision in court or, as I suspect, they will be settled out of court.&nbsp;</p>



<p>In any case, data, competence and to some extent hardware, are the levers to control the development of AI.&nbsp;</p>



<p>Open Source has been leveling the playing field of technologies. We know from past experience with Open Source software that giving people unrestricted access to the means of digital production enables tremendous economic value. This worked in Europe as well as in China. We think that Open Source AI can have the same effect of generating value while leaving control of the technology in the hands of society.</p>



<p><em>Question: Big tech companies are important for the development of AI. Apart from the purely technological impacts, there is also economic importance. The European Commission has been very concerned about the Digital Single Market recently, and has initiated legislation such as DSA and DMA to improve competition and market access. Will these instruments be sufficient in view of AI roll-out, thinking also of the recently adopted AI Act? Or will additional attention need to be paid?</em></p>



<p>Response: Open is the best antidote to the concentration of power. That said, I see these legislations as the sticks, very necessary. I’d love us to think also about carrots. We don’t want to repeat the mistakes of the past with the early years of the internet. Open Source software was equally available in the US and Europe but despite that, the few European champions of Open Source haven’t grown big enough to have a global impact. And some of the biggest EU companies aren’t exactly friendly with Open Source either.&nbsp;</p>



<p>Chinese companies have taken a different approach. But in Europe we have talents, and we have an attractive quality of life so we can get even more talents. Finding money is never an issue. We need to remove the disincentives to grow our companies bigger, widen the access to the internal EU market and support their international expansion, too.</p>



<p>For example, we need to review European Regulation 1025, on standardization to accommodate for Open Source. 1025 Regulation was written at a time when Open Source was considered a “business model” and information and communication technology standards were about voltages in a wire. Today, Open Source is between 80% and 90% of all software and “digital elements” comprise some part of every modern product. Even hardware solutions are dominated by “digital elements.” As such, the approach taken by 1025 is out of date and most likely needs a root-and-branch rethink to properly apply to the world today and the world we anticipate tomorrow.</p>



<p>We need to make sure that the standardization rules required by the Cyber Resilience Act are written together with Open Source champions so the rules don’t favor exclusively the cartel of European patent holders who try to seek rent instead of innovating. Europe has all the means to be at the center of AI innovation; It embodies the right values of diversity and collaboration.&nbsp;</p>



<p>Closing remarks: We think that Open Source is the best antidote to fight market concentration in AI. Data is where the concentration of power is happening now and it’s in the hands of massive corporations: not only Google, Meta, Amazon, Reddit but also Sony, Warner, Netflix, Getty Images, Adobe … All these companies have already gained access to massive amounts of data, legally. These companies basically own our data, legally: Our pictures, the graph of our circles of friends, all the books and movies… </p>



<p>There is a risk that if we don’t write policies that allow text and data mining in exchange of a real Open Source AI (one that society can fully control) then we risk leaving the most powerful AI systems in the hands of the oligopoly who can afford trading money for access to data.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/contributions-of-open-source-to-ai-a-panel-discussion-at-cpdp-ai-conference/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">26305</post-id>	</item>
		<item>
		<title>Exploring openness in AI: Insights from the Columbia Convening</title>
		<link>https://opensource.org/blog/exploring-openness-in-ai-insights-from-the-columbia-convening</link>
					<comments>https://opensource.org/blog/exploring-openness-in-ai-insights-from-the-columbia-convening#respond</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Thu, 23 May 2024 12:00:00 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=25933</guid>

					<description><![CDATA[A framework to discuss openness and AI published by Columbia Institute of Global Politics and Mozilla, in collaboration with OSI and leading AI scholars and practitioners.]]></description>
										<content:encoded><![CDATA[
<p>Over the past year, a robust debate has emerged regarding the benefits and risks of open sourcing foundation models in AI. This discussion has often been characterized by high-level generalities or narrow focuses on specific technical attributes. One of the key challenges—one that the OSI community is <a href="https://opensource.org/blog/open-source-ai-definition-weekly-update-may-13">addressing head on</a>—is defining Open Source within the context of foundation models. </p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="640" height="361" src="https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?resize=640%2C361&#038;ssl=1" alt="" class="wp-image-25935" srcset="https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?resize=1024%2C577&amp;ssl=1 1024w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?resize=300%2C169&amp;ssl=1 300w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?resize=768%2C432&amp;ssl=1 768w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?resize=1536%2C865&amp;ssl=1 1536w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?w=2048&amp;ssl=1 2048w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?w=1280&amp;ssl=1 1280w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/05/image.png?w=1920&amp;ssl=1 1920w" sizes="(max-width: 640px) 100vw, 640px" data-recalc-dims="1" /></figure>



<p>A new framework is proposed to help inform practical and nuanced decisions about the openness of AI systems, including foundation models. The recent <a href="https://blog.mozilla.org/en/mozilla/ai/new-framework-for-ai-openness-and-innovation/">proceedings from the Columbia Convening on Openness in Artificial Intelligence</a>, made available for the first time this week, are a welcome addition to the process.</p>



<p>The Columbia Convening brought together experts and stakeholders to discuss the complexities and nuances of openness in AI. The goal was not to define <a href="https://opensource.org/deepdive">Open Source AI</a> but to illuminate the multifaceted nature of the issue. The proceedings reflect the February conversations and are based on the backgrounder text developed collaboratively with the working group.</p>



<p>One of the significant contributions of these proceedings is the framework for understanding openness across the AI stack. The framework summarizes previous work on the topic, analyzes the various reasons for pursuing openness, and outlines how openness varies in different parts of the AI stack, both at the model and system levels. This approach provides a common descriptive framework to deepen a more nuanced and rigorous understanding of openness in AI. It also aims to enable further work around definitions of openness and safety in AI.</p>



<p>The proceedings emphasize the importance of recognizing safety safeguards, licenses, and documents as attributes rather than components of the AI stack. This evolution from a model stack to a system stack underscores the dynamic nature of the AI field and the need for adaptable frameworks.</p>



<p>These proceedings are set to be released in time for the upcoming <a href="https://www.reuters.com/technology/south-korea-host-second-ai-safety-summit-may-21-22-2024-04-12/">AI Safety Summit in South Korea</a>. This timely release will help maintain momentum ahead of further discussions on openness at the French summit in 2025.</p>



<p>We’re happy to see collaboration of like-minded individuals in discussing and solving the varied problems associated with openness in AI.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/exploring-openness-in-ai-insights-from-the-columbia-convening/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">25933</post-id>	</item>
		<item>
		<title>Why datasets built on public domain might not be enough for AI</title>
		<link>https://opensource.org/blog/why-datasets-built-on-public-domain-might-not-be-enough-for-ai</link>
					<comments>https://opensource.org/blog/why-datasets-built-on-public-domain-might-not-be-enough-for-ai#respond</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Tue, 07 May 2024 10:00:00 +0000</pubDate>
				<category><![CDATA[Opinions]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[copyright]]></category>
		<category><![CDATA[data]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=24855</guid>

					<description><![CDATA[Common Corpus is a public domain dataset for training large language models (LLMs).  Boasting 500 billion words in multiple languages, drawn from various cultural initiatives, it offers researchers a powerful tool to develop smaller and more efficient LLMs. It should not be abused as a tool to promote public policies that expand the reach of copyright law.]]></description>
										<content:encoded><![CDATA[
<p>There is tension between copyright laws and large datasets suitable to train large language models. Common Corpus is a dataset that only uses text from copyright-expired sources to bypass the legal issues. It’s a useful achievement, paving the path to research without immediate risk of lawsuits. I also fear that this approach may lead to bad policies, reinforcing the power of copyright holders; not the small creators but large corporations.&nbsp;</p>



<h2 class="wp-block-heading">A dataset built on public domain sources</h2>



<p>In March 2024 <a href="https://huggingface.co/blog/Pclanglais/common-corpus">Common Corpus</a> was released as an open access dataset for training large language models (LLMs). Announcing the release, the lead developer Pierre-Carl Langlais says “Common Corpus shows it is possible to train fully open LLMs on sources without copyright concerns.” The dataset contains 500 billion words in multiple European languages and different cultural heritages. It is a project coordinated by the French startup <a href="https://pleias.fr">Pleias</a> and supported by organizations committed to open science such as <a href="https://occiglot.eu/">Occiglot</a>, <a href="https://www.eleuther.ai/">Eleuther AI</a> and <a href="https://www.nomic.ai/">Nomic AI</a> as well as being partly funded by the French government. The stated intention of Common Corpus is to democratize access to large quality datasets. It has many other positive characteristics, highlighted also by Open Future’s <a href="https://openfuture.eu/blog/common-corpus-building-ai-as-commons/">summary of a talk given by Langlais</a>. </p>



<h2 class="wp-block-heading">The commons needs more data</h2>



<p>The debates sparked by the <a href="http://opensource.org/deepdive">Deep Dive: AI</a> process on the <a href="https://discuss.opensource.org/t/training-data-access/152">role of training data</a> highlighted that AI practitioners encounter many obstacles assembling datasets. At the same time, we discovered that tech giants have an incredible advantage over researchers and startups. They’ve been <a href="https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.">slurping data for decades</a>, have the financial means to <a href="https://bots.law/little-cases/ai-cases-bot/">go to court</a> and can enter into <a href="https://techcrunch.com/2024/02/22/reddit-says-its-made-203m-so-far-licensing-its-data/?guccounter=1">bilateral agreements</a> to license data. These strategies are inaccessible to small competitors and academics. Accepting that the only path to creating open large datasets suitable to train <a href="https://opensource.org/deepdive">Open Source AI</a> systems is to use sources in the public domain, risks cementing the dominant positions of existing large corporations.</p>



<p>The open landscape already faces issues with big tech and their ability to influence legislation. The big corporations have lobbied to <a href="https://en.wikipedia.org/wiki/Copyright_Term_Extension_Act">extend the duration of copyright</a>, introduced the <a href="https://www.eff.org/issues/dmca">DMCA</a>, are opposing the <a href="https://www.eff.org/issues/right-to-repair">right to repair</a>, and have the resources to continue lobbying and sue any new entrant who they deem to get too close. There are <a href="https://www.eff.org/work">plenty of examples</a> showing an unequal advantage in protecting what they think is theirs. The non-profit Fairly Trained certifies companies “<a href="https://www.wired.com/story/proof-you-can-train-ai-without-slurping-copyrighted-content/">willing to prove that they’ve trained their AI models on data that they own, have licensed, or that is in the public domain</a>,” respecting copyright law: who’s going to benefit from this approach?</p>



<h2 class="wp-block-heading">Unsuitable for public policies</h2>



<p>Initiatives like Common Corpus and <a href="https://www.bigcode-project.org/docs/about/the-stack/">The Stack</a> (used to train <a href="https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/">Starcoder2</a>) are important achievements as they allow researchers to develop new AI systems while mitigating the risk of being sued. They also push the technical boundaries of what can be achieved with smaller datasets that don’t require a nuclear power plant to train new models. But I think they mask the underlying issue: AI needs data and limiting open datasets to only public domain sources will never give them a chance to match the size of the proprietary ones. The lobby for copyright maximalists is always looking for ways to expand scope and extend terms for copyright laws, and when they succeed it is a one-way ratchet. It would be a tragedy for society if legislators listened to their sophistry and made new laws doing this based on the apparent consensus that creators need protection from AI.<br>The role of data for training machine learning systems is a divisive topic and a complex one. Having datasets like Common Corpus is a very useful way for the science of AI to progress with better sources. For policies, we’d be better off pushing for something like the proposal advanced by Open Future and Creative Commons in their paper <a href="https://openfuture.eu/publication/towards-a-books-data-commons-for-ai-training/">Towards a Books Data Commons for AI Training</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/why-datasets-built-on-public-domain-might-not-be-enough-for-ai/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24855</post-id>	</item>
		<item>
		<title>OSI participates in Columbia Convening on openness and AI; first readouts available</title>
		<link>https://opensource.org/blog/osi-participates-in-columbia-convening-on-openness-and-ai-first-readouts-available</link>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Thu, 04 Apr 2024 13:47:00 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[policy]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=23170</guid>

					<description><![CDATA[Stefano Maffulli participates in collaboration on building a framework for openness in AI with Mozilla and the Columbia Institute of Global Politics with technical and policy memorandums being released.]]></description>
										<content:encoded><![CDATA[
<p>I was invited to join Mozilla and the Columbia Institute of Global Politics in an effort that explores what “open” should mean in the AI era. A cohort of 40 leading scholars and practitioners from Open Source AI startups and companies, non-profit AI labs, and civil society organizations came together on February 29 at the <a href="https://blog.mozilla.org/en/mozilla/ai/introducing-columbia-convening-openness-and-ai/">Columbia Convening</a> to collaborate on ways to strengthen and leverage openness for the good of all. We believe openness can and must play a key role in the future of AI. The Columbia Convening took an important step toward developing a framework for openness in AI with the hope that open approaches can have a significant impact on AI, just as Open Source software did in the early days of the internet and World Wide Web.&nbsp;</p>



<p>This effort is&nbsp;aligned and contributes valuable knowledge to the ongoing process to find <a href="https://opensource.org/deepdive/drafts">the Open Source AI Definition</a>.&nbsp;</p>



<p>As a result of this <a href="https://blog.mozilla.org/en/mozilla/ai/readouts-columbia-convening/">first meeting of Columbia Convening</a>, two readouts have been published; a <a href="https://foundation.mozilla.org/en/research/library/technical-readout-columbia-convening-on-openness-and-ai/">technical memorandum</a> for technical leaders and practitioners who are shaping the future of AI, and a <a href="https://foundation.mozilla.org/en/research/library/policy-readout-columbia-convening-on-openness-and-ai/">policy memorandum</a> for policymakers with a focus on openness in AI.</p>



<h2 class="wp-block-heading">Technical readout</h2>



<p>The <a href="https://assets.mofoprod.net/network/documents/Technical_Readout_-_Columbia_Convening_on_Openness_and_AI.pdf">Columbia Convening on Openness and AI Technical Readout</a> was edited by Nik Marda with review contributions from myself, Deval Pandya, Irene Solaiman, and Victor Storchan.</p>



<p>The technical readout highlighted the challenges of understanding openness in AI. Approaches to openness are falling under three categories: gradient/spectrum, criteria scoring, and binary. The OSI is championing a binary approach to openness, where AI systems are either “open” or “closed” based on whether they meet a certain set of criteria.</p>



<p>The technical readout also provided a diagram that shows how the AI stack may be described by the different dimensions (AI artifacts, documentation, and distribution) of its various components and subcomponents.</p>



<figure class="wp-block-image size-full has-lightbox"><a href="https://assets.mofoprod.net/network/documents/Technical_Readout_-_Columbia_Convening_on_Openness_and_AI.pdf"><img decoding="async" width="259" height="344" src="https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/technical_readout.png?resize=259%2C344&#038;ssl=1" alt="" class="wp-image-23173" srcset="https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/technical_readout.png?w=259&amp;ssl=1 259w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/technical_readout.png?resize=226%2C300&amp;ssl=1 226w" sizes="(max-width: 259px) 100vw, 259px" data-recalc-dims="1" /></a></figure>



<p><a href="https://assets.mofoprod.net/network/documents/Technical_Readout_-_Columbia_Convening_on_Openness_and_AI.pdf"></a></p>



<h2 class="wp-block-heading">Policy readout</h2>



<p>The <a href="https://assets.mofoprod.net/network/documents/Policy_Readout_-_Columbia_Convening_on_Openness_and_AI_Final.pdf">Columbia Convening on Openness and AI Policy Readout</a> was edited by Udbhav Tiwari with review contributions from Kevin Klyman, Madhulika Srikumar, and myself.</p>



<p>The policy readout highlighted the benefits of openness, including:</p>



<ul class="wp-block-list">
<li>Enhancing reproducible research and promoting innovation</li>



<li>Creating an open ecosystem of developers and makers</li>



<li>Promoting inclusion through open development culture and models</li>



<li>Facilitating accountability and supporting bias research</li>



<li>Fostering security through widespread scrutiny</li>



<li>Reducing costs and avoiding vendor lock-In</li>



<li>Equipping supervisory authorities with necessary tools</li>



<li>Making training and inference more resource-efficient, reducing environmental harm</li>



<li>Ensuring competition and dynamism</li>



<li>Providing recourse in decision-making</li>
</ul>



<p>The policy readout also showcased a table with the potential benefits and drawbacks of each component of the AI stack, including the code, datasets, model weights, documentation, distribution, and guardrails.</p>



<p>Finally, the policy readout provided some policy recommendations:</p>



<ul class="wp-block-list">
<li>Include standardized definitions of openness as part of AI standards</li>



<li>Promote agency, transparency and accountability</li>



<li>Facilitate innovation and mitigate monopolistic practices</li>



<li>Expand access to computational resources</li>



<li>Mandate risk assessment and management for certain AI applications</li>



<li>Hold independent audits and red teaming</li>



<li>Update privacy legislation to specifically address AI challenges</li>



<li>Updated legal framework to distinguish the responsibilities of different actors</li>



<li>Nurture AI research and development grounded in openness</li>



<li>Invest in education and specialized training programs</li>



<li>Adapt IP laws to support open licensing models</li>



<li>Engage the general public and stakeholders</li>
</ul>



<figure class="wp-block-image size-full has-lightbox"><a href="https://foundation.mozilla.org/en/research/library/policy-readout-columbia-convening-on-openness-and-ai/"><img decoding="async" width="259" height="344" src="https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/policy_readout.png?resize=259%2C344&#038;ssl=1" alt="" class="wp-image-23175" srcset="https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/policy_readout.png?w=259&amp;ssl=1 259w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/04/policy_readout.png?resize=226%2C300&amp;ssl=1 226w" sizes="(max-width: 259px) 100vw, 259px" data-recalc-dims="1" /></a></figure>



<p><a href="https://assets.mofoprod.net/network/documents/Policy_Readout_-_Columbia_Convening_on_Openness_and_AI_Final.pdf"></a></p>



<p>You can follow along with the work of Columbia Convening at <a href="http://mozilla.org/research/cc">mozilla.org/research/cc</a> and the work from the Open Source Initiative on the definition of Open Source AI at <a href="http://opensource.org/deepdive">opensource.org/deepdive</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">23170</post-id>	</item>
		<item>
		<title>Letter to U.S. Commerce Secretary Raimondo urging protection of openness and transparency in AI</title>
		<link>https://opensource.org/blog/letter-to-u-s-commerce-secretary-raimondo-urging-protection-of-openness-and-transparency-in-ai</link>
					<comments>https://opensource.org/blog/letter-to-u-s-commerce-secretary-raimondo-urging-protection-of-openness-and-transparency-in-ai#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Mon, 25 Mar 2024 18:18:21 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[policy]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=22767</guid>

					<description><![CDATA[The OSI contributed, along with other members of civil society and academia, to a letter drafted by Mozilla and the Center for Democracy &#038; Technology (CDT) asking the White House and Congress to exercise great caution when considering whether and how to regulate the publication of open models.]]></description>
										<content:encoded><![CDATA[
<p>The Open Source Initiative (OSI) contributed, along with other members of civil society and academia, to a letter drafted by <a href="https://www.mozilla.org/">Mozilla</a> and the <a href="https://cdt.org/">Center for Democracy &amp; Technology (CDT)</a> asking the White House and Congress to exercise great caution when considering whether and how to regulate the publication of open models.</p>



<p>The letter demonstrates how openness allows collaborative efforts to build, shape and test AI for the benefit of all, and speaks of the need for policy, technology and advocacy in creating a better future through trustworthiness and accountability in AI innovation. The letter highlighted three broad points of consensus about openness and transparency in AI:</p>



<ul class="wp-block-list">
<li>Open models can provide significant benefits to society, and policy should sustain and expand these benefits.</li>



<li>Policy should be based on clear evidence of marginal risks that open models pose compared to closed models.</li>



<li>Policy should consider a wide range of solutions to address well-defined marginal risks in a tailored fashion.</li>
</ul>



<p>The letter was sent today, March 25, 2024, in advance of the Department of Commerce’s <a href="https://www.ntia.gov/federal-register-notice/2024/dual-use-foundation-artificial-intelligence-models-widely-available#">comment deadline on AI models</a> which closes March 27. You can read the letter below and at <a href="https://cdt.org/insights/cdt-joins-mozilla-civil-society-orgs-and-leading-academics-in-urging-us-secretary-of-commerce-to-protect-ai-openness/">CDT’s website</a>.</p>



<div data-wp-interactive="core/file" class="wp-block-file"><object data-wp-bind--hidden="!state.hasPdfPreview"  class="wp-block-file__embed" data="https://opensource.org/wp-content/uploads/2024/03/Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024.pdf" type="application/pdf" style="width:100%;height:600px" aria-label="Embed of Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024."></object><a id="wp-block-file--media-7a7f3bd7-e247-4049-b59c-671e5d475e02" href="https://opensource.org/wp-content/uploads/2024/03/Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024.pdf">Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024</a><a href="https://opensource.org/wp-content/uploads/2024/03/Civil-Society-Letter-on-Openness-for-NTIA-Process-March-25-2024.pdf" class="wp-block-file__button wp-element-button" download aria-describedby="wp-block-file--media-7a7f3bd7-e247-4049-b59c-671e5d475e02">Download</a></div>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/letter-to-u-s-commerce-secretary-raimondo-urging-protection-of-openness-and-transparency-in-ai/feed</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">22767</post-id>	</item>
		<item>
		<title>Results of 2024 elections of OSI board of directors</title>
		<link>https://opensource.org/blog/results-of-2024-elections-of-osi-board-of-directors</link>
					<comments>https://opensource.org/blog/results-of-2024-elections-of-osi-board-of-directors#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Tue, 19 Mar 2024 19:34:23 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[board elections]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=22337</guid>

					<description><![CDATA[The polls just closed, the results are in. Congratulations to the returning directors Thierry Carrez and Josh Berkus, and the newly elected director Chris Aniszczyk.]]></description>
										<content:encoded><![CDATA[
<p>The polls just closed, the results are in. Congratulations to the returning directors <a href="https://opensource.org/board-member/thierry-carrez">Thierry Carrez</a> and <a href="https://opensource.org/board-member/josh-berkus" data-type="link" data-id="https://opensource.org/board-member/josh-berkus">Josh Berkus</a>, and the newly elected director <a href="https://opensource.org/board-member/chris-aniszczyk" data-type="link" data-id="https://opensource.org/board-member/chris-aniszczyk">Chris Aniszczyk</a>.</p>



<p>Thierry Carrez has been confirmed and joins as a director elected by the Affiliate organizations. Chris Aniszczyk and Josh Berkus collected the votes of the Individual members.</p>



<p>The OSI thanks all of those who participated in the 2024 board elections by casting a ballot and asking questions to the candidates. We also want to extend our sincerest gratitude to all of those who stood for election. We were once again honored with an incredible slate of candidates who stepped forward from across the open source software community to support the OSI’s work, and advance the OSI’s mission. The 2024 nominees were again, remarkable: experts from a variety of fields and technologies with diverse skills and experience gained from working across the open source community. We hope the entire Open Source software community will join us in thanking them for their service and their leadership. We’re better off because of their contributions and commitment, and we thank them.</p>



<h2 class="wp-block-heading"><span style="font-weight: bold;">Next steps</span></h2>



<p>The board of directors has formalized the election results in an ad-hoc meeting and invited the newly elected director to the onboarding meeting.</p>



<h2 class="wp-block-heading"><span style="font-weight: bold;">The complete election results</span></h2>



<h3 class="wp-block-heading"><span style="font-weight: bold;">OSI Affiliate directors elections 2024</span></h3>



<p>There were 6 candidates competing for 1 seat. The number of voters was 38 and there were 38 valid votes and 0 empty ballots.</p>



<p>Counting votes using Scottish STV.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="640" height="345" src="https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_affiliates_2024.png?resize=640%2C345&#038;ssl=1" alt="" class="wp-image-22419" srcset="https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_affiliates_2024.png?w=695&amp;ssl=1 695w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_affiliates_2024.png?resize=300%2C162&amp;ssl=1 300w" sizes="(max-width: 640px) 100vw, 640px" data-recalc-dims="1" /></figure>



<p>Winner is Thierry Carrez.</p>



<p><a href="https://www.opavote.com/results/5321915978743808" data-type="link" data-id="https://www.opavote.com/results/5321915978743808">Details from affiliates elections</a>.</p>



<h3 class="wp-block-heading"><span style="font-weight: bold;">OSI Individual directors elections 2024</span></h3>



<p>There were 11 candidates competing for 2 seats. The number of voters was 158 and there were 158 valid votes and 0 empty ballots.</p>



<p>Counting votes using Scottish STV.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="640" height="523" src="https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_individuals_2024.png?resize=640%2C523&#038;ssl=1" alt="" class="wp-image-22420" srcset="https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_individuals_2024.png?w=695&amp;ssl=1 695w, https://i0.wp.com/opensource.org/wp-content/uploads/2024/03/elections_individuals_2024.png?resize=300%2C245&amp;ssl=1 300w" sizes="(max-width: 640px) 100vw, 640px" data-recalc-dims="1" /></figure>



<p>Winners are Chris Aniszczyk and Josh Berkus.</p>



<p><a href="https://www.opavote.com/results/4866957634437120">Details from individuals elections</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/results-of-2024-elections-of-osi-board-of-directors/feed</wfw:commentRss>
			<slash:comments>17</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">22337</post-id>	</item>
		<item>
		<title>A candid conversation on The Changelog Podcast about defining Open Source AI, and what is really at stake</title>
		<link>https://opensource.org/blog/a-candid-conversation-on-the-changelog-podcast-about-defining-open-source-ai-and-what-is-really-at-stake</link>
					<comments>https://opensource.org/blog/a-candid-conversation-on-the-changelog-podcast-about-defining-open-source-ai-and-what-is-really-at-stake#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Tue, 05 Mar 2024 06:00:00 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<category><![CDATA[podcast]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=20959</guid>

					<description><![CDATA[ Listen to The Changelog podcast discussing the work of the OSI, especially toward a formal Definition of Open Source AI.]]></description>
										<content:encoded><![CDATA[
<p>I was recently invited to join hosts <a href="https://www.linkedin.com/in/adamstacoviak">Adam Stacoviak</a> and <a href="https://www.linkedin.com/in/jerodsanto">Jerod Santo</a> on <a href="https://changelog.com/podcast">The Changelog podcast</a>. The Changelog features deep technical reviews and conversations about the most recent news in the world of software, and this was the first time anyone from the OSI has appeared on the show.&nbsp;</p>



<p>After introducing the Open Source Initiative, we discussed the challenges of not only defending the Definition itself, but the idea that we need a Definition at all. And I was able to explain the complicated nature of being a global nonprofit organization defending the Open Source Definition for over 25 years.</p>



<p>I outlined the three programs that comprise the work of the OSI—legal and licensing, policy and standards, and advocacy and outreach—at which time we dove right into the project that falls under the latter program: the Open Source AI Definition.</p>



<p>Open Source AI is not the same as Open Source software. This reality led to the <a href="https://opensource.org/deepdive" data-type="page" data-id="16917">Deep Dive: AI</a> project, now in year 3, in which OSI is collaborating with some of the largest corporations, researchers, creators, foundations and others.&nbsp;</p>



<p>The Changelog hosts asked a lot of great questions and we had a candid and productive conversation. I hope you’ll follow the link to listen to the full episode: <a href="https://changelog.com/podcast/578">Changelog Interviews: What exactly is Open Source AI?</a></p>



<p>As I shared with Adam and Jerod, I&#8217;m hosting <a href="https://opensource.org/deepdive#townhalls" data-type="link" data-id="https://opensource.org/events/open-source-ai-definition-town-hall-7">bi-weekly discussions</a> on the status of the project and we&#8217;ve put together a forum for public input, so if you are interested in learning more about this or contributing, you are welcome to join us at <a href="http://discuss.opensource.org" data-type="link" data-id="discuss.opensource.org">discuss.opensource.org</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/a-candid-conversation-on-the-changelog-podcast-about-defining-open-source-ai-and-what-is-really-at-stake/feed</wfw:commentRss>
			<slash:comments>29</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">20959</post-id>	</item>
		<item>
		<title>New risk assessment framework offers clarity for open AI models</title>
		<link>https://opensource.org/blog/new-risk-assessment-framework-offers-clarity-for-open-ai-models</link>
					<comments>https://opensource.org/blog/new-risk-assessment-framework-offers-clarity-for-open-ai-models#comments</comments>
		
		<dc:creator><![CDATA[Stefano Maffulli]]></dc:creator>
		<pubDate>Tue, 27 Feb 2024 17:45:30 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=20690</guid>

					<description><![CDATA[The marginal risk associated with open foundation models has been clarified in a recent position paper, addressing a contentious debate in the AI community.]]></description>
										<content:encoded><![CDATA[
<p>There is a debate within the AI community around the risks of widely releasing foundation models with their weights and the societal impact of that decision. Some are arguing that the wide availability of Llama2 or Stable Diffusion XL are a net negative for society. A position paper released today shows that there is insufficient evidence to effectively characterize the marginal risk of these models relative to other technologies.&nbsp;</p>



<p>The paper was authored by Sayash Kappor of Princeton University and Rishi Bommasani of Stanford University, me and others and is directed at AI developers, researchers investigating the risks of AI, competition regulators, and policymakers who are challenged with how to govern open foundation models.&nbsp;</p>



<p>This paper introduces a risk assessment framework to be used with open models. This resource helps explain why the marginal risk is low in some cases where we already have evidence from past waves of digital technology. It reveals that past work has focused on different subsets of the framework with different assumptions, serving to clarify disagreements about misuse risks. By outlining the necessary components of a complete analysis of the misuse risk of open foundation models, it lays out a path to a more constructive debate moving forward.</p>



<p>I hope this work will support a constructive debate where risks of AI are grounded in science and today’s reality, rather than hypothetical, future scenarios. This paper offers a position that balances the case against open foundation models with substantiated analysis and a useful framework on which to build. Please <a href="https://crfm.stanford.edu/open-fms/">read the paper</a> and leave your comments on Mastodon or LinkedIn.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/new-risk-assessment-framework-offers-clarity-for-open-ai-models/feed</wfw:commentRss>
			<slash:comments>20</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">20690</post-id>	</item>
	</channel>
</rss>
