<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[MLNomads]]></title><description><![CDATA[MLNomads]]></description><link>https://blog.mlnomads.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1702825333407/mv7RT18m7.png</url><title>MLNomads</title><link>https://blog.mlnomads.com</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 10:19:27 GMT</lastBuildDate><atom:link href="https://blog.mlnomads.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[#AISprint Geminotes: Your Gemini AI-Powered web companion toward better web web-notes-taking experience]]></title><description><![CDATA[GitHub Repostory: StoicBug/Geminotes (github.com)
Link to the app: geminotes.mlnomads.com or https://geminotes.netlify.app
In the digital age, information overload is a constant challenge. We often find ourselves drowning in a sea of web pages, artic...]]></description><link>https://blog.mlnomads.com/aisprint-geminotes</link><guid isPermaLink="true">https://blog.mlnomads.com/aisprint-geminotes</guid><category><![CDATA[AI]]></category><category><![CDATA[gemini]]></category><category><![CDATA[note-taking]]></category><category><![CDATA[AISprint ]]></category><dc:creator><![CDATA[El Bachir Outidrarine]]></dc:creator><pubDate>Tue, 01 Oct 2024 02:49:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1727746559996/e167b5af-97e0-4020-a3b2-016df2581fea.avif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>GitHub Repostory: <a target="_blank" href="https://github.com/StoicBug/Geminotes"><strong>StoicBug/Geminotes (</strong></a><a target="_blank" href="http://github.com"><strong>github.com</strong></a><a target="_blank" href="https://github.com/StoicBug/Geminotes"><strong>)</strong></a></p>
<p>Link to the app: <a target="_blank" href="http://geminotes.mlnomads.com">geminotes.mlnomads.com</a> or <a target="_blank" href="https://geminotes.netlify.app/">https://geminotes.netlify.app</a></p>
<p>In the digital age, information overload is a constant challenge. We often find ourselves drowning in a sea of web pages, articles, and online resources, struggling to keep track of important information. Enter Geminotes, a groundbreaking tool that reimagines the way we capture, organize, and interact with digital notes. Born from the need for a more intelligent note-taking system, Geminotes combines the power of artificial intelligence with intuitive user interfaces to create a seamless note-taking ecosystem. This innovative project, remarkably developed in just one month, represents a significant leap forward in personal knowledge management. Whether you're a student, researcher, or professional, Geminotes offers a suite of features designed to enhance your digital reading and note-taking experience. In this post, we'll explore the user-friendly aspects of Geminotes and delve into its sophisticated technical architecture, providing valuable insights for both users and engineers alike.</p>
<h2 id="heading-for-users-how-to-use-geminotes">For Users: How to Use Geminotes</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727713085684/1e7f90a4-302e-4f02-a6b8-d6920f6e10ab.png?auto=compress,format&amp;format=webp" alt /></p>
<p>Geminotes is designed with user experience at its core, offering a range of features that make note-taking and knowledge management effortless and intuitive. Let's explore how Geminotes can transform your digital reading and learning process.</p>
<h3 id="heading-key-features">Key Features</h3>
<p>Geminotes boasts a comprehensive set of features that work together to create a powerful note-taking environment:</p>
<ol>
<li><p><strong>Chrome Extension</strong>: At the heart of Geminotes' functionality is its Chrome extension. This lightweight add-on seamlessly integrates with your browsing experience, allowing you to capture important information with just a click. As you navigate the web, the extension remains readily accessible, ensuring that no valuable insight slips through the cracks.</p>
</li>
<li><p><strong>Web Application</strong>: The Geminotes web application serves as your central hub for note management and AI interaction. This user-friendly interface provides a comprehensive overview of your digital knowledge, allowing you to organize, search, and interact with your notes in ways that were previously impossible.</p>
</li>
<li><p><strong>AI Assistant</strong>: Leveraging the cutting-edge capabilities of Google's Gemini model, Geminotes' AI assistant takes your note-taking to the next level. This intelligent feature can analyze your notes, provide insights, answer questions, and even help generate new ideas based on your collected information.</p>
</li>
<li><p><strong>User Accounts</strong>: To ensure a personalized and secure experience, Geminotes implements robust user account functionality. This not only keeps your notes private but also enables real-time updates across devices, ensuring that your knowledge base is always up-to-date and accessible.</p>
</li>
</ol>
<h3 id="heading-step-by-step-guide">Step-by-Step Guide</h3>
<p>To help you get started with Geminotes, here's a detailed guide on how to use its key features:</p>
<ol>
<li><p><strong>Installation</strong>: Begin your Geminotes journey by installing the Chrome extension. Visit the Chrome Web Store and search for Geminotes. Once installed, you'll see the Geminotes icon in your browser's toolbar. Next, head to the Geminotes website to create your account. This account will serve as your gateway to the web application and ensure your notes are securely stored and synced.</p>
</li>
<li><p><strong>Saving Notes</strong>: With Geminotes installed, saving notes becomes a breeze. As you browse the web and encounter information you want to save, simply highlight the relevant text. Once highlighted, click on the Geminotes extension icon in your toolbar. The extension will automatically save the highlighted text along with the URL of the page, ensuring you always have context for your notes.</p>
</li>
<li><p><strong>Managing Notes</strong>: To access and manage your saved notes, log in to the Geminotes web application. Here, you'll find a clean, intuitive interface displaying all your saved notes. Use the powerful search functionality to quickly locate specific information, or browse through your notes chronologically. The application allows you to organize notes into categories, add tags, and even create connections between related pieces of information.</p>
</li>
<li><p><strong>AI Interaction</strong>: One of Geminotes' most powerful features is its AI assistant. To use this feature, select any note in the web application and open the AI chat interface. Here, you can ask questions about your notes, request explanations of complex concepts, get suggestions for improving note clarity, or even generate new ideas based on your existing notes. The AI assistant uses advanced natural language processing to provide contextual, intelligent responses that enhance your understanding and productivity.</p>
</li>
<li><p><strong>Collaboration</strong> (Coming Soon): While currently in development, the collaboration feature will allow you to share notes with team members or colleagues. This functionality will transform Geminotes from a personal tool into a powerful platform for collective knowledge management and collaborative learning.</p>
</li>
</ol>
<h2 id="heading-for-engineers-technical-architecture">For Engineers: Technical Architecture</h2>
<p>Behind Geminotes' user-friendly interface lies a sophisticated technical architecture designed for performance, scalability, and rapid development. This section provides an in-depth look at the technologies and design decisions that power Geminotes, offering valuable insights for engineers and developers.</p>
<h3 id="heading-technology-stack">Technology Stack</h3>
<ol>
<li><p><strong>Frontend</strong>:</p>
<ul>
<li><p>Framework: Angular</p>
</li>
<li><p>Styling: Tailwind CSS</p>
</li>
<li><p>Rationale: Angular provides a robust structure for large-scale applications, while Tailwind CSS enables rapid UI development.</p>
</li>
</ul>
</li>
<li><p><strong>Backend</strong>:</p>
<ul>
<li><p>Platform: Firebase</p>
</li>
<li><p>Services Used:</p>
<ul>
<li><p>Firestore (Database)</p>
</li>
<li><p>Cloud Functions (Serverless backend)</p>
</li>
<li><p>Authentication (User accounts)</p>
</li>
</ul>
</li>
<li><p>Rationale: Firebase offers a comprehensive suite of tools that enable rapid development and easy scaling.</p>
</li>
</ul>
</li>
<li><p><strong>AI Integration</strong>: At the core of Geminotes' intelligent features is Google's Gemini model, integrated via Google Cloud AI APIs. Gemini represents the state-of-the-art in natural language processing, allowing us to implement sophisticated AI-powered features such as question answering, text summarization, and idea generation. The decision to use Gemini was driven by its advanced capabilities and seamless integration with our Google Cloud-based infrastructure.</p>
</li>
<li><p><strong>Chrome Extension</strong>: The Geminotes Chrome extension is built using JavaScript and the Chrome Extension API. This combination allows for deep integration with the Chrome browser, enabling features like text selection and one-click note saving. The extension serves as a crucial bridge between the user's browsing experience and the Geminotes ecosystem.</p>
</li>
</ol>
<h3 id="heading-system-architecture">System Architecture</h3>
<p>Geminotes' system architecture is designed for efficiency, scalability, and real-time responsiveness. Here's a detailed breakdown of how the different components interact:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727747517786/8e7d0498-9fe5-4bcf-8d46-161c0928dc78.png" alt class="image--center mx-auto" /></p>
<p>This architecture ensures a smooth flow of data from the user's browser to our backend systems and AI services, enabling the real-time, intelligent note-taking experience that defines Geminotes.</p>
<h3 id="heading-scalability-and-performance-considerations">Scalability and Performance Considerations</h3>
<p>As Geminotes grows, several factors have been considered to ensure its continued performance and scalability:</p>
<ul>
<li><p>Firestore's NoSQL structure allows for efficient querying and real-time updates, even as the volume of notes increases.</p>
</li>
<li><p>Cloud Functions automatically scale based on demand, ensuring responsive performance during usage spikes.</p>
</li>
<li><p>We're implementing caching mechanisms for frequently accessed data to reduce database reads and improve response times.</p>
</li>
<li><p>API usage is closely monitored, with rate limiting implemented to manage costs and prevent abuse.</p>
</li>
</ul>
<p>These architectural decisions and considerations form the foundation of Geminotes, enabling its powerful features while ensuring scalability and performance as the user base grows.</p>
<h2 id="heading-team">Team</h2>
<p>Behind Geminotes is a dedicated team of professionals passionate about leveraging technology to enhance learning and productivity:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727712352099/b87ffe06-588a-4808-b6ad-d71a0ebdd667.png?auto=compress,format&amp;format=webp" alt /></p>
<p><a target="_blank" href="http://outidrarine.com/"><strong>El Bachir Outidrarine</strong></a> (<a class="user-mention" href="https://hashnode.com/@stoicbug">El Bachir Outidrarine</a> ) Software Engineer and a student specializing in Big Data and Cloud Computing at ENSET Mohammedia. He shows a deep commitment to advancing technology and bettering his community. As he continues his journey in software engineering, he remains driven by his technical expertise and a passion for contributing to society, eager to leave a positive, lasting impact on the tech world.</p>
<p><a target="_blank" href="https://www.tahabouhsine.com/"><strong>Taha BOUHSINE</strong></a> <a target="_blank" href="https://www.tahabouhsine.com/"><strong>(</strong></a><a class="user-mention" href="https://hashnode.com/@tahabsn">@tahabsn</a>)<a target="_blank" href="https://hashnode.com/@tahabsn"><strong>,</strong></a> <strong>an AI/ML</strong> Google Developer Expert and organizer of the MLNomads community, provided his expertise which was instrumental in laying the project's foundation. He provided guidance, sharing insights gleaned from his extensive experience in the field of machine learning and software engineering</p>
<h2 id="heading-acknowledgment">Acknowledgment</h2>
<p>Google AI/ML Developer Programs team supported this work by providing Google Cloud Credit.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Geminotes represents a significant step forward in the realm of digital note-taking and knowledge management. By seamlessly integrating modern web technologies with advanced AI capabilities, we've created a tool that not only simplifies the process of capturing and organizing information but also enhances how users interact with and learn from their notes.</p>
<p>As we continue to develop and refine Geminotes, we're excited about the possibilities that lie ahead. The fusion of AI with note-taking has the potential to revolutionize how we process and retain information in the digital age. We envision Geminotes evolving into an indispensable tool for students, researchers, professionals, and anyone passionate about lifelong learning.</p>
<p>We invite you to join us on this journey. Whether you're a user excited to explore new ways of managing your digital knowledge, or a developer interested in contributing to the project, your involvement and feedback are crucial to the growth and improvement of Geminotes.</p>
<p>Together, we can shape the future of AI-assisted note-taking and push the boundaries of what's possible in personal knowledge management. Welcome to the Geminotes community – let's innovate, learn, and grow together.</p>
]]></content:encoded></item><item><title><![CDATA[Bookmarkai V 1.0]]></title><description><![CDATA[Welcome to BookmarksAI, your friendly Chrome bookmarks searcher and manager!
In this blog, we'll go through the cool features this extension has to offer and how we leveraged Gemini Pro to power up your Chrome experience with AI.
Link to Repo: https:...]]></description><link>https://blog.mlnomads.com/bookmarkai-v-10</link><guid isPermaLink="true">https://blog.mlnomads.com/bookmarkai-v-10</guid><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><category><![CDATA[gemini]]></category><category><![CDATA[Bookmark Management]]></category><category><![CDATA[chrome extension]]></category><category><![CDATA[chatbot]]></category><category><![CDATA[#PromptEngineering]]></category><dc:creator><![CDATA[Taha Bouhsine]]></dc:creator><pubDate>Wed, 25 Sep 2024 03:05:39 GMT</pubDate><content:encoded><![CDATA[<p>Welcome to BookmarksAI, your friendly Chrome bookmarks searcher and manager!</p>
<p>In this blog, we'll go through the cool features this extension has to offer and how we leveraged Gemini Pro to power up your Chrome experience with AI.</p>
<p>Link to Repo: <a target="_blank" href="https://github.com/mlnomadpy/bookmarksai">https://github.com/mlnomadpy/bookmarksai</a></p>
<p>Once you download and install the extension, you'll probably see something like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727231993807/43f12404-28df-494c-9beb-d2d6538584de.png" alt class="image--center mx-auto" /></p>
<p>Simple, right?</p>
<p>You've got the settings button, the prompt text field, the save prompt button, and the search field as well as the prompt button. But let's take it one thing at a time:</p>
<h2 id="heading-first-things-first-the-settings-button">First things first: The settings button</h2>
<p>When you download the extension for the first time, you need to:</p>
<ol>
<li><p>Get your API key from AI Studio for Gemini</p>
</li>
<li><p>Click Settings and Insert it in the field</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727232061000/82c13e74-fc33-4f1c-963c-7d2889bb43bf.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Press Save API Key</p>
</li>
<li><p>Click on "Reload all the bookmarks" to add your previously saved browser bookmarks to the extension</p>
</li>
</ol>
<p>Note: This might take a bit of time, especially if you're a bookmark hoarder like me!</p>
<h2 id="heading-now-lets-search">Now, let's search!</h2>
<p>Go ahead and try to find that lost page that you added years ago and forgot its title. Add some keywords - let's say a long-forgotten paper about "byol", but you don't know the title.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727232187492/4a9fcbbd-6dd5-4215-86f3-e70c580a1f90.png" alt class="image--center mx-auto" /></p>
<p>And voila! It will search the content of the different pages you bookmarked, not just the title. Neat, right?</p>
<p>But wait, there's more!</p>
<h2 id="heading-ai-powered-features">AI-powered features</h2>
<ol>
<li><p>You can generate a summary with just a click of a button</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727232242934/a397edd3-cdad-40cb-982e-56ff2bc1544c.png" alt class="image--center mx-auto" /></p>
</li>
</ol>
<h3 id="heading-custom-prompts">Custom prompts</h3>
<p>Now, select the pages that you're interested in. Let's say you want a comparison, or more like a literature review paragraph. Here's what you do:</p>
<ol>
<li><p>Type your request in the prompt text field</p>
</li>
<li><p>Like the prompt? Save it before you forget!</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727232354065/a3c0c818-bf98-4bde-863e-a80e0dd26d47.png" alt class="image--center mx-auto" /></p>
</li>
<li><p>Hit that prompt button and...</p>
</li>
</ol>
<p>The results are in!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1727233317545/bfef7d37-cfe4-4bd6-9900-fac8143f3774.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-wrapping-up">Wrapping up</h2>
<p>So there you have it - BookmarksAI in a nutshell. It's like having a super-smart assistant that remembers all your bookmarks better than you do. Whether you're a student drowning in research papers, a professional trying to keep track of industry news, or just someone who likes to save interesting articles for later, BookmarksAI has got your back.</p>
<p>Give it a try and say goodbye to the days of endless scrolling through your bookmarks. Happy browsing, and may you never lose a webpage again!</p>
<p>Link to Repo: <a target="_blank" href="https://github.com/mlnomadpy/bookmarksai">https://github.com/mlnomadpy/bookmarksai</a></p>
<p>P.S. We're always improving BookmarksAI. Got any cool ideas or feedback? We'd love to hear from you!</p>
]]></content:encoded></item><item><title><![CDATA[#AISprint Darija AI: A Community-Driven Platform for Moroccan Darija Translation Dataset - Minority Languages]]></title><description><![CDATA[GitHub Repostory: https://github.com/ElhoubeBrahim/collect-darija
Platform Link: https://darijaai.mlnomads.com
With the recent advancements in human-like interactions achieved by Large Language Models (LLMs) worldwide, one crucial need for building t...]]></description><link>https://blog.mlnomads.com/darijaai-platform</link><guid isPermaLink="true">https://blog.mlnomads.com/darijaai-platform</guid><category><![CDATA[darija]]></category><category><![CDATA[AI]]></category><category><![CDATA[translation]]></category><category><![CDATA[AI Model]]></category><category><![CDATA[AISprint ]]></category><dc:creator><![CDATA[Ahmed Houssam BOUZINE]]></dc:creator><pubDate>Tue, 17 Sep 2024 20:45:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1726329380251/98436239-53e5-4dd7-b339-06b65b2cc41b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>GitHub Repostory: <a target="_blank" href="https://github.com/ElhoubeBrahim/collect-darija">https://github.com/ElhoubeBrahim/collect-darija</a></p>
<p>Platform Link: <a target="_blank" href="https://darijaai.mlnomads.com">https://darijaai.mlnomads.com</a></p>
<p>With the recent advancements in human-like interactions achieved by Large Language Models (LLMs) worldwide, one crucial need for building these models in any language is the availability of high-quality datasets. These datasets must represent:</p>
<ol>
<li><p>The way native people use the language</p>
</li>
<li><p>A high level of knowledge expressed in the language</p>
</li>
<li><p>A careful curation to prevent harmful biases</p>
</li>
</ol>
<p>This process is particularly challenging and expensive for minority languages spoken only in certain regions.</p>
<p>Moroccan Darija, a unique dialect spoken by over 91% of Moroccan citizens, exemplifies this challenge. This rich linguistic tapestry blends Arabic with influences from Amazigh, French, and Spanish. The complexity and nuances of Darija make it a formidable task for LLM development. While datasets like DODA and AtlassIA have been created, they often fall short of fully capturing the variety of spoken Darija. This limitation is frequently due to:</p>
<ul>
<li><p>Restrictive rules in data collection</p>
</li>
<li><p>Limited translation options</p>
</li>
<li><p>Insufficient representation of colloquial usage</p>
</li>
<li><p>Data leaks between Arabic and Moroccan Darija</p>
</li>
</ul>
<p>Creating thorough and precise datasets for Darija poses a major challenge and opportunity in natural language processing and machine translation. It is of utmost importance to develop a Moroccan Darija Large Language Model that truly captures the essence of this language.</p>
<p><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfHJn-8HtGYveQc0jvYMOgS4-GZBzXsbfJ3pNiPdrST5lYLx4sG_Jm3xb2P4sd3eaUZgfnk3_jrwbw_Dkbti7XSf-N2XWQLPB7QksGk9KUkHwdrOg0xNAV-ba8ihDW0hosFZW63drPfMJ6simU1g3cAWy1G?key=0YmNgHmK2HduCrVxNX2BUg" alt="https://arabcenterdc.org/wp-content/uploads/2023/06/Morocco-Djemaa-el-Fna-market-Marrakech-768x432.jpg" class="image--center mx-auto" /></p>
<p>To address these challenges, we have developed "Darija.AI," an innovative crowdsourcing platform designed to build a comprehensive Moroccan Darija-English translation dataset. This platform serves multiple purposes:</p>
<ol>
<li><p><strong>Translation</strong>: Contributors can translate English phrases sourced from the Mozilla Common Voice dataset into Moroccan Darija.</p>
</li>
<li><p><strong>Review and Evaluation</strong>: Users can assess and rate previous translations, ensuring quality and accuracy.</p>
</li>
<li><p><strong>Community Engagement</strong>: By involving native speakers and language enthusiasts, we capture the true essence and diversity of spoken Darija.</p>
</li>
<li><p><strong>Scalability</strong>: The crowdsourcing approach allows for rapid expansion of the dataset, covering a wide range of topics and linguistic nuances.</p>
</li>
</ol>
<p>By leveraging collective knowledge and fostering community participation, Darija AI aims to create a rich, nuanced, and authentic resource for Darija-English translation, paving the way for more accurate and contextually appropriate language models and translation tools.</p>
<h2 id="heading-technical-architecture">Technical Architecture</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1725975856300/0ecb8273-ee02-4152-8897-1ceb2cb4f6ce.png" alt class="image--center mx-auto" /></p>
<p>The platform is built on a robust architecture designed to handle large volumes of data efficiently. We used Firebase to handle authentication, data processing, storage, and business logic, ensuring the system's robustness, security, and scalability.</p>
<p>The architecture consists of several key components within the Firebase ecosystem:</p>
<ol>
<li><p><strong>Firebase Environment</strong><br /> The entire system is hosted within Firebase, providing a serverless infrastructure that allows for easy scaling and maintenance.</p>
</li>
<li><p><strong>Functions</strong><br /> Firebase Functions hosts three key serverless functions:</p>
<ul>
<li><p>GetSentence: Retrieves sentences for translation from the Sentences collection.</p>
</li>
<li><p>TranslateSentence: Handles the translation process and stores results in the Translations collection.</p>
</li>
<li><p>RateTranslation: Allows users to rate translations, likely updating the Translations collection.</p>
</li>
</ul>
</li>
</ol>
<p>    These functions serve as the backend logic for the application, processing requests and interacting with the database.</p>
<ol start="3">
<li><p><strong>Firestore</strong><br /> Firebase's NoSQL database, Firestore, is used to store data in three collections:</p>
<ul>
<li><p>Sentences: Stores original sentences sourced from Mozilla Common Voice.</p>
</li>
<li><p>Translations: Stores translated sentences.</p>
</li>
<li><p>Reviews: Stores the rating of the translated sentences.</p>
</li>
</ul>
</li>
<li><p><strong>Authentication Service  
 </strong>Firebase Authentication is implemented to manage user authentication and authorization, ensuring secure access to the platform's features.</p>
</li>
<li><p><strong>External Integration</strong><br /> The system integrates with Mozilla Common Voice, serving as the source for English sentences to be translated into Moroccan Darija.</p>
</li>
<li><p><strong>Client Application</strong><br /> The frontend, represented by the Angular logo, suggests an Angular-based web application that interacts with Firebase through API calls.</p>
</li>
<li><p><strong>Data Flow</strong></p>
<ul>
<li><p>The client app authenticates users and receives an auth token.</p>
</li>
<li><p>It makes API calls to the Firebase Functions for sentence retrieval, translation submission, and rating.</p>
</li>
<li><p>Functions interact with Firestore for data operations.</p>
</li>
<li><p>User authentication is verified before processing requests.</p>
</li>
</ul>
</li>
</ol>
<p>This architecture leverages Firebase's serverless model, allowing for easy scaling and maintenance. The separation of concerns between frontend, backend functions, and data storage provides a modular and flexible system for handling sentence translations and ratings. By utilizing Firebase's integrated services, the platform can efficiently manage user authentication, data processing, and storage, ensuring a smooth and secure experience for contributors engaged in building the Moroccan Darija translation dataset.</p>
<h3 id="heading-translations-collection">Translations Collection</h3>
<p>The Darija AI platform employs a diverse range of translation sources to ensure a wide variety of content and contexts. Users are presented with randomly selected English sentences from the Mozilla Common Voice dataset, which they then translate into Moroccan Darija. This approach not only provides a broad spectrum of phrases but also keeps the translation process engaging and dynamic for contributors.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1725975586455/b98fbf7e-0bb6-4456-a2cd-0cf0a7fe6873.png" alt class="image--center mx-auto" /></p>
<p>To capture the rich diversity of Darija, we've designed the translation feature without imposing limitations on users' input. Participants have the freedom to write their translations using either Arabic or Latin scripts, ensuring flexibility and inclusivity. Furthermore, there are no restrictions on regional dialects or nuances, allowing for a comprehensive and authentic representation of the language. This unrestrictive approach helps preserve the unique variations and cultural richness of Darija, making our dataset a true reflection of its diversity.</p>
<p>The authenticity of the translations is ensured by engaging native speakers of Moroccan Darija as contributors. All users participating in the translation process are fluent in the dialect, bringing their innate understanding of the language's subtleties and cultural context to each translation. This native-speaker focus is crucial in capturing the true essence of Darija, including its idiomatic expressions, colloquialisms, and regional variations.</p>
<p>To maintain a continuous flow of translations and encourage ongoing participation, the platform immediately provides users with a new sentence to translate after they submit each translation.</p>
<p>We've also incorporated elements of gamification and recognition into the platform to enhance user engagement and motivation. Users earn points for each submitted translation, and these points are reflected in a global leaderboard. This leaderboard showcases top contributors, fostering a sense of healthy competition and providing recognition for valuable contributions to the project. By highlighting the efforts of active participants, we aim to build a strong, committed community of contributors dedicated to the goal of creating a comprehensive Darija-English dataset.</p>
<h3 id="heading-peer-reviews">Peer Reviews</h3>
<p>To maintain the accuracy and reliability of our dataset, we've implemented a robust peer review and quality assurance process that leverages the expertise of our user community. This innovative approach, which we call Peer-Driven Validation, not only transforms our contributors into active participants in the quality control process but also enables us to harness advanced machine learning techniques, specifically reinforcement learning with human feedback (RLHF).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1725975610080/9d456546-ac5c-424d-90d5-cf3a05398a87.png" alt class="image--center mx-auto" /></p>
<p>The Peer-Driven Validation process is designed to be both comprehensive and user-friendly. Users are regularly presented with existing translations and transcriptions from the dataset and are asked to assess their quality. This assessment involves rating the accuracy, clarity, and cultural appropriateness of the translations, as well as providing detailed feedback when necessary. By engaging users in this way, we tap into the collective knowledge and intuition of native Darija speakers, ensuring that our dataset remains true to the nuances and complexities of the language.</p>
<p>This iterative review process serves multiple purposes. Firstly, it acts as a powerful filter, helping to identify and correct any inaccuracies or inconsistencies in the dataset. Secondly, it provides valuable insights into the translation process itself, highlighting common challenges or areas of ambiguity that may require further attention or clarification in our guidelines.</p>
<p>Moreover, the human feedback gathered through this process is instrumental in implementing reinforcement learning techniques. By utilizing RLHF, we can train our language models to not just learn from the initial translations but to continuously improve based on the preferences and corrections provided by our user community. This approach allows our models to adapt and refine their outputs over time, learning to generate translations that are more natural, contextually appropriate, and aligned with human judgments.</p>
<h2 id="heading-how-to-contribute"><strong>How to contribute?</strong></h2>
<p>The Darija AI project thrives on community engagement and collaboration. We've designed multiple pathways for individuals to contribute, whether they're language enthusiasts or skilled developers. By joining our community, you can play a crucial role in preserving and advancing Moroccan Darija in the digital age.</p>
<h3 id="heading-for-translation-contributors">For Translation Contributors</h3>
<p>Becoming a part of the Darija AI community as a language contributor is a straightforward yet impactful process. To begin, simply sign up on our platform at <a target="_blank" href="https://darijaai.mlnomads.com">https://darijaai.mlnomads.com.</a> Once registered, you'll have immediate access to our translation interface, where you can start transforming English phrases from the Mozilla Common Voice dataset into rich, authentic Moroccan Darija.</p>
<p>Every translation you provide is a valuable contribution to our growing dataset. Your input not only expands the breadth of our linguistic resource but also enhances the accuracy and cultural relevance of Darija-English translations. As you contribute, you'll see your efforts recognized through our gamified system. Our leaderboard tracks your progress, adding an element of friendly competition and motivation to the translation process.</p>
<p>Beyond translation, we encourage all contributors to engage in our peer review process. By evaluating and providing feedback on translations submitted by fellow contributors, you play a pivotal role in our quality assurance mechanism. This peer-driven validation process is crucial for maintaining the high standards of our dataset and ensures that our translations capture the true essence and diversity of Moroccan Darija.</p>
<h3 id="heading-for-developers">For Developers</h3>
<p>For those with technical expertise in language technology or software development, the Darija AI project offers unique opportunities to contribute your skills. Our project's codebase is open-source and available on GitHub, providing a transparent and collaborative environment for development.</p>
<p>Visit the repo: <a target="_blank" href="https://github.com/ElhoubeBrahim/collect-darija">https://github.com/ElhoubeBrahim/collect-darija</a></p>
<p>As a developer, your contributions can significantly enhance various aspects of the Darija AI platform. Whether your expertise lies in frontend design, creating intuitive and engaging user interfaces, or in backend infrastructure, optimizing our data processing pipelines, your input can drive meaningful improvements. We also welcome contributions in areas such as machine learning model optimization, particularly in refining our RLHF implementations.</p>
<p>Some key areas where developer contributions can make a substantial impact include:</p>
<ol>
<li><p>Enhancing the user experience of our translation and validation interfaces</p>
</li>
<li><p>Optimizing our data storage and retrieval systems for improved performance</p>
</li>
<li><p>Developing new features to gamify the contribution process and increase user engagement</p>
</li>
<li><p>Improving our algorithms for matching reviewers with appropriate content for validation</p>
</li>
<li><p>Implementing advanced analytics to derive insights from our growing dataset</p>
</li>
<li><p>implementing RLHF pipelines to better leverage user feedback in model training</p>
</li>
</ol>
<p>By contributing to the Darija AI project, developers have the opportunity to work on cutting-edge language technology while making a tangible impact on the preservation and advancement of Moroccan Darija. Your work will directly influence the quality of language models and translation tools for this unique dialect, potentially benefiting millions of Darija speakers in Morocco and worldwide.</p>
<h2 id="heading-the-team">The Team</h2>
<p>The Darija AI project emerged from a confluence of academic curiosity, technological expertise, and community support, embodying the spirit of collaborative innovation in addressing linguistic challenges.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726060325646/eefa0266-90bb-4e99-9a84-a92300010fb7.png" alt class="image--center mx-auto" /></p>
<p>In the heart of Morocco's vibrant tech scene, a team of ambitious Engineers recognized the pressing need for advanced language tools for Moroccan Darija.</p>
<p><a target="_blank" href="https://www.linkedin.com/in/ahmed-houssam-bouzine/">Ahmed Houssam BOUZINE</a> a software engineer and a Big Data and Cloud Computing student at ENSET Mohammedia. As part of this project, he was responsible for implementing a review feature that allows users to rate previous translations to ensure translation quality and accuracy. Ahmed developed both the user interface components and the corresponding backend endpoints for this feature.</p>
<p><a target="_blank" href="https://www.linkedin.com/in/elqessouartariq/">Tariq EL QESSOUAR</a> a software engineer and a Big Data and Cloud Computing student at ENSET Mohammedia. In this project, Tariq took charge of developing both the frontend and backend components for the leaderboard and history pages. These features allow users to easily monitor rankings and access their historical data, significantly improving the platform's usability and engagement.</p>
<p><a target="_blank" href="https://www.linkedin.com/in/elhoube-brahim/">Brahim EL HOUBE</a> a software engineer and a student specializing in Big Data and Cloud Computing at ENSET Mohammedia. In the project, he was responsible for developing the backend APIs and various UI components to maintain consistency throughout the app, as well as overseeing the deployment and monitoring of production performance.</p>
<p><a target="_blank" href="https://www.tahabouhsine.com/">Taha BOUHSINE</a> (@tahabsn), a ML/AI Google Developer Expert and organizer of the MLNomads community, provided his expertise which was instrumental in laying the project's foundation. He provided initial guidance, sharing insights gleaned from his extensive experience in the field of machine learning.</p>
<h2 id="heading-acknowledgment">Acknowledgment</h2>
<p>Google AI/ML Developer Programs team supported this work by providing Google Cloud Credit. #AISprint</p>
<p>We would like to extend our heartfelt appreciation to the following contributors, whose invaluable feedback and contributions were instrumental in refining the platform's requirements and enhancing the dataset:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>#</td><td>User</td><td>Score</td><td>Translations</td></tr>
</thead>
<tbody>
<tr>
<td>001</td><td><strong>Ayoub Boulmeghras</strong></td><td>1200</td><td>120</td></tr>
<tr>
<td>002</td><td><strong>Mohamed Ouaicha</strong></td><td>1100</td><td>110</td></tr>
<tr>
<td>003</td><td><strong>Moussa Aoukacha</strong></td><td>1050</td><td>105</td></tr>
<tr>
<td>004</td><td><strong>Yassir Salmi</strong></td><td>1030</td><td>103</td></tr>
<tr>
<td>005</td><td><strong>Anas Aberchih</strong></td><td>1020</td><td>102</td></tr>
<tr>
<td>006</td><td><strong>Mohamed Ait Hassoun</strong></td><td>820</td><td>82</td></tr>
<tr>
<td>007</td><td><strong>Akram Elmouden</strong></td><td>480</td><td>48</td></tr>
<tr>
<td>008</td><td><strong>El-houssaine Ohssine</strong></td><td>400</td><td>40</td></tr>
<tr>
<td>009</td><td><strong>Kawtar Khallouq</strong></td><td>130</td><td>13</td></tr>
<tr>
<td>010</td><td><strong>Ajidah Ski</strong></td><td><strong>70</strong></td><td>7</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>By leveraging collective knowledge and fostering community participation, Darija AI aims to create a rich, nuanced, and authentic resource for Darija-English translation. This approach paves the way for more accurate and contextually appropriate language models and translation tools, while simultaneously preserving and celebrating the unique characteristics of Moroccan Darija. Through this innovative crowdsourcing platform, we are not only building a valuable linguistic resource but also engaging the Darija-speaking community in the process of documenting and preserving their language in the digital age.</p>
<p>By implementing this Peer-Driven Validation process, we ensure that our Darija-English dataset is not just extensive, but also accurate, nuanced, and reflective of the true diversity of Moroccan Darija. This collaborative approach to quality assurance supports the continuous improvement of our dataset, making it an increasingly valuable resource for developing sophisticated language models and translation tools for Darija.</p>
<p>We invite all interested contributors, whether language enthusiasts or skilled developers, to join us in this exciting endeavor. Together, we can build a comprehensive, high-quality Darija-English dataset that will serve as a foundation for advanced language technologies, bridging linguistic gaps and preserving the rich cultural heritage embedded in Moroccan Darija.</p>
]]></content:encoded></item><item><title><![CDATA[#AISprint Multimodal-verse:
I - Intro to the Multimodal-Verse]]></title><description><![CDATA[Hey there, AI adventurer!
Ready to step into the wild world of multimodality? Buckle up, because we're about to take your AI knowledge from "meh" to "mind-blowing"!
First things first: What's this multimodal business all about?
Picture this: You're s...]]></description><link>https://blog.mlnomads.com/aisprint-i-intro-to-the-multimodal-verse</link><guid isPermaLink="true">https://blog.mlnomads.com/aisprint-i-intro-to-the-multimodal-verse</guid><category><![CDATA[#multimodalai]]></category><category><![CDATA[Multimodality]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Artificial Intelligence]]></category><dc:creator><![CDATA[Taha Bouhsine]]></dc:creator><pubDate>Mon, 16 Sep 2024 23:37:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724687007596/1462397c-12a1-40a2-8d94-c849a1403255.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey there, AI adventurer!</p>
<p>Ready to step into the wild world of multimodality? Buckle up, because we're about to take your AI knowledge from "meh" to "mind-blowing"!</p>
<h2 id="heading-first-things-first-whats-this-multimodal-business-all-about">First things first: What's this multimodal business all about?</h2>
<p>Picture this: You're scrolling through your social media feed. You see a meme with a picture and a caption.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724687833475/d531ab3f-6038-41b3-ad1e-5752cf9bda64.jpeg" alt class="image--center mx-auto" /></p>
<p>Your brain instantly processes both the image and the text, combining them to understand the joke. That, my friend, is multimodality in action!</p>
<p>In the AI world, multimodality is all about combining different types of data or "modalities" - like text, images, audio, or video - to create smarter, more human-like AI systems. It's like giving your AI superpowers!</p>
<p>Let's break it down with some cool concepts:</p>
<h3 id="heading-platos-cave-the-og-multimodal-thinker">Plato's Cave: The OG Multimodal Thinker</h3>
<p>Remember Plato's allegory of the cave?</p>
<p>If not, here's the TL;DR: Imagine prisoners chained in a cave, only able to see shadows on a wall. They think those shadows are reality. But when one escapes and sees the real world, they realize how limited their perception was.</p>
<p><img src="https://nofilmschool.com/media-library/the-platonic-definition-plato-s-allegory-of-the-cave.jpg?id=34050097&amp;width=740&amp;quality=90" alt class="image--center mx-auto" /></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/d2afuTvUzBQ">https://youtu.be/d2afuTvUzBQ</a></div>
<p> </p>
<p>This is like unimodal AI models - they're stuck looking at shadows (one type of data), while multimodal models get to experience the full, glorious reality! (seeing, smelling, and even tasting the reality)</p>
<p>Unimodal AI models are like that friend who's really good at one thing but clueless about everything else. They might be text wizards or image gurus, but they're missing out on the bigger picture.</p>
<h3 id="heading-unique-information-in-different-modalities-the-spice-of-ai-life">Unique Information in Different Modalities: The Spice of (AI) Life</h3>
<p>Let's take a stroll through some different ways we take in information and how we use it - you know, like how our brains make sense of the world around us. It's pretty cool when you think about it!</p>
<p>Starting with good old text, it's our go-to for spelling things out clearly and getting into those big, abstract ideas. Like, imagine trying to explain quantum physics without writing it down - yikes! But text has its limits too. Ever had a friend send you a message that just says "I'M FINE" in all caps? You know they're probably not fine, but you can't hear the exasperation in their voice or see the eye roll.</p>
<p>Now, pictures - they're worth a thousand words, right? Images are awesome for showing us how things look and fit together. Think about trying to assemble IKEA furniture with just written instructions. The diagrams save us from ending up with a chair that looks more like abstract art!</p>
<p>Here's a wild one - the thermal spectrum. It's like having superhero vision, showing us heat patterns we can't normally see. Imagine being able to spot a warm-blooded critter hiding in the bushes at night, or finding where your house is leaking heat in winter. Engineers and doctors use this all the time, like checking for hotspots in electrical systems or looking for inflammation in the body.</p>
<p>Audio is where things get really personal. It's not just about hearing words; it's about feeling them. Remember that example of your friend yelling? In a voice message, you'd hear the frustration, maybe even a bit of a voice crack. That's way different from seeing "I'M ANGRY" typed out. Audio lets us pick up on all those little cues - the excitement in someone's voice when they're talking about their passion project, or the soothing tones of your favorite chill-out playlist.</p>
<p>Video is like the superhero of information - it's got visuals and sound working together. It's perfect for when you need to see how something moves or changes over time. Think about learning a new dance move - reading about it? Tricky. Seeing a picture? Better. But watching a video where you can see and hear the instructor? Now we're talking!</p>
<p>All these different ways of taking in info work together to give us a fuller picture of what's going on. It's like having a toolbox where each tool has its own special job, but when you use them all together, you can build something amazing. Pretty neat how our brains juggle all this stuff, huh?</p>
<p>And you know what's even cooler? By combining these modalities, we can create AI systems that understand the world more like humans do. It's like giving your AI a pair of glasses, a hearing aid, and a really good book all at once! This multi-modal approach brings us one step closer to developing AI that can perceive and interpret the world with the same richness and complexity that we do. Imagine an AI that can not only read a recipe, but also watch a cooking video, listen to the sizzle of the pan, and even detect when something's starting to burn - now that's a kitchen assistant I'd want on my team cooking those Tajines!</p>
<h3 id="heading-glimpse-on-multimodality-in-action-in-the-presentfuture-world">Glimpse on Multimodality in action in the present/future world:</h3>
<p>Multimodal AI isn't just a cool party trick. It's revolutionizing fields in ways that affect our daily lives. Let's break it down.</p>
<p>Healthcare is getting a major upgrade thanks to multimodal AI. Imagine your doctor having a super-smart assistant that can look at your medical records, analyze your X-rays, and even process data from wearable devices - all at once. This AI can spot patterns and make connections that might be missed otherwise. It's like having a whole team of specialists working together to give you the best possible care. The result? More accurate diagnoses, personalized treatment plans, and potentially catching health issues before they become serious problems.</p>
<p>When it comes to autonomous vehicles, multimodal AI is literally driving the future. These smart cars aren't just using one type of sensor - they're combining data from cameras, LiDAR (that's like radar, but with lasers), GPS, and more. It's as if the car has eyes, ears, and an excellent sense of direction all working together. This fusion of data helps the vehicle understand its environment more completely, making split-second decisions to navigate safely. It's not just about getting from A to B; it's about making the journey as safe as possible for everyone on the road.</p>
<p>Everyone saw the virtual assistants announced by Google and OpenAI, cool right? Well, virtual assistants are getting a whole lot smarter too. Gone are the days of simple voice commands. Multimodal AI is helping create assistants that can understand context, pick up on visual cues, and interact more naturally. Imagine talking to your smart home system while cooking, and it can see that your hands are full, hear the sizzle of the pan, and automatically set a timer without you having to ask. It's like having a helpful friend in the room who just gets what you need.</p>
<p>Lastly, content moderation is becoming more effective and nuanced with multimodal AI. In today's digital world, harmful content isn't limited to just text. By analyzing text, images, and videos together, AI can better understand context and nuance. This means it can more accurately identify things like hate speech, misinformation, or inappropriate content across different formats. It's like having a really smart, really fast team of moderators working 24/7 to keep online spaces safer for everyone.</p>
<h2 id="heading-too-much-philosophy-lets-math-just-a-bit">Too Much philosophy, let's Math! just a bit.</h2>
<p>Too much philosophy for you? No problem! Let's speak Math and whip up a delicious info-theory smoothie. It's like the nutritional science of data, but way cooler!</p>
<p>Imagine each modality as a different food group in your AI diet. Here's our menu:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726516595114/5178be15-ded9-413d-a3fb-f2ed373cddfe.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Text: Your proteins, are the building blocks of information. It's like chicken breast for your AI - lean, mean, and full of explicit facts.</p>
</li>
<li><p>Images: The carbs of the data world. They give you that quick energy boost of visual information, helping you picture things instantly.</p>
</li>
<li><p>Thermal images: Think of these as your healthy fats. They might seem extra, but they provide that crucial layer of information about heat and energy that you can't get elsewhere.</p>
</li>
<li><p>Depth Images: These are your vitamins and minerals, adding that extra dimension (literally) to your AI's understanding. They're like the spinach of your data diet - packed with spatial goodness!</p>
</li>
</ul>
<p>Now, the task you're trying to solve? That's like your specific dietary need. Sometimes, you can get by on just protein shakes (hello, text-only models!). But for optimal health - or in our case, top-notch AI performance - you often need a balanced diet. That's where our multimodal approach comes in, like a perfectly planned meal prep!</p>
<p>Here's where it gets juicy (pun intended):</p>
<ol>
<li><p><strong>Information Overlap:</strong> Different modalities often contain overlapping information. It's like getting vitamin C from both oranges and bell peppers. Your AI might learn about an object's shape from both an image and a depth map. This redundancy? It's not a waste - it's reinforcement!</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726516733032/6c557890-22b8-4d6a-9f1e-c7fd9a95bcd1.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><strong>Unique Information:</strong> Each modality brings its own special flavor to the table. Text might give you the name of a dish, images show you how it looks, thermal images reveal how it's cooked, and depth images let you appreciate its texture. It's like how you can only get certain omega-3s from fish - some info is modality-exclusive.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726516768382/a6676407-c5dd-46fd-a4b3-9bc1b11758e1.png" alt class="image--center mx-auto" /></p>
</li>
<li><p><strong>Synergistic Information:</strong> This is where the magic happens! Combine modalities, and suddenly, 1+1=3. A sarcastic text message + an audio clip of the tone = understanding the true meaning. It's like how calcium and vitamin D work together for stronger bones. In multimodal AI, this synergy can lead to insights greater than the sum of its parts.</p>
</li>
<li><p><strong>Task-Relevant Information:</strong> The secret sauce is extracting what's most relevant to your task. It's like customizing your diet for specific fitness goals. Want to build muscle? Up the protein. Training an AI to recognize emotions? Maybe prioritize facial expressions (images) and tone of voice (audio) over thermal data.</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1726516927641/9593dab7-de1b-45e5-879f-3a5608587ed3.png" alt class="image--center mx-auto" /></p>
</li>
</ol>
<p>By mixing multiple modalities, we're essentially increasing the mutual information between our input buffet and the desired output. It's like expanding our menu to make sure we're getting all the nutrients (information) we need for a specific health goal (task).</p>
<p>So there you have it - multimodal AI, served up with a side of nutritional metaphors and a sprinkle of information theory. Bon appétit, data scientists!</p>
<h2 id="heading-the-bottom-line">The Bottom Line</h2>
<p>Multimodality is the secret sauce that's taking AI to the next level. By breaking free from the limitations of unimodal approaches, we're creating AI systems that can see the world more like we do - in all its complex, multifaceted glory.</p>
<p>So, the next time you effortlessly understand a meme or instantly recognize your friend's sarcastic tone in a voice message, remember: that's multimodal processing in action. And now, we're teaching machines to do the same!</p>
<p>Stay tuned for our next post, where we'll dive deeper into why going multimodal is not just cool, but crucial for the future of AI. Until then, keep your eyes, ears, and mind open to the multimodal world around you, and don't stop sharing those memes.</p>
<h2 id="heading-acknowledgments">Acknowledgments</h2>
<p>Google AI/ML Developer Programs team supported this work by providing Google Cloud Credit.</p>
<h2 id="heading-references">References</h2>
<blockquote>
<p><em>I will try to use the same numbers for citations for the rest of the blogs.</em></p>
</blockquote>
<h3 id="heading-resources">Resources</h3>
<ul>
<li>Plato’s Cave: <a target="_blank" href="https://nofilmschool.com/allegory-of-the-cave-in-movies">https://nofilmschool.com/allegory-of-the-cave-in-movies</a></li>
</ul>
<h3 id="heading-papers-and-theses">Papers and Theses</h3>
<ol>
<li><p>Le-Khac, P. H., Healy, G., &amp; Smeaton, A. F. (2020). Contrastive representation learning: A framework and review. IEEE Access, 8, 193907–193934.</p>
<p> <a target="_blank" href="https://doi.org/10.1109/ACCESS.2020.3031549">https://doi.org/10.1109/ACCESS.2020.3031549</a></p>
</li>
<li><p>Jia, C., Yang, Y., Xia, Y., Chen, Y., Parekh, Z., Pham, H., Le, Q., Sung, Y., Li, Z., &amp; Duerig, T. (2021). Scaling up Visual and Vision-Language representation learning with noisy text supervision. <em>International Conference on Machine Learning</em>, 4904–4916. <a target="_blank" href="http://proceedings.mlr.press/v139/jia21b/jia21b.pdf">http://proceedings.mlr.press/v139/jia21b/jia21b.pdf</a></p>
</li>
<li><p>Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., &amp; Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv. <a target="_blank" href="https://arxiv.org/abs/2103.00020">https://arxiv.org/abs/2103.00020</a></p>
</li>
<li><p>Zhai, X., Mustafa, B., Kolesnikov, A., &amp; Beyer, L. (2023, October). <em>Sigmoid loss for language image pre-training</em>. In <em>2023 IEEE/CVF International Conference on Computer Vision (ICCV)</em> (pp. 11941-11952). IEEE. <a target="_blank" href="https://doi.org/10.1109/ICCV51070.2023.01100">https://doi.org/10.1109/ICCV51070.2023.01100</a></p>
</li>
<li><p>Li, S., Zhang, L., Wang, Z., Wu, D., Wu, L., Liu, Z., Xia, J., Tan, C., Liu, Y., Sun, B., &amp; Stan Z. Li. (n.d.). Masked modeling for self-supervised representation learning on vision and beyond. In IEEE [Journal-article]. <a target="_blank" href="https://arxiv.org/pdf/2401.00897">https://arxiv.org/pdf/2401.00897</a></p>
</li>
<li><p>Jia, C., Yang, Y., Xia, Y., Chen, Y., Parekh, Z., Pham, H., Le, Q., V., Sung, Y., Li, Z., &amp; Duerig, T. (2021, February 11). Scaling up Visual and Vision-Language representation learning with noisy text supervision. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2102.05918">https://arxiv.org/abs/2102.05918</a></p>
</li>
<li><p>Bachmann, R., Kar, O. F., Mizrahi, D., Garjani, A., Gao, M., Griffiths, D., Hu, J., Dehghan, A., &amp; Zamir, A. (2024, June 13). 4M-21: An Any-to-Any Vision model for tens of tasks and modalities. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2406.09406">https://arxiv.org/abs/2406.09406</a></p>
</li>
<li><p>Bao, H., Dong, L., Piao, S., &amp; Wei, F. (2021, June 15). BEIT: BERT Pre-Training of Image Transformers. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2106.08254">https://arxiv.org/abs/2106.08254</a></p>
</li>
<li><p>Balestriero, R., Ibrahim, M., Sobal, V., Morcos, A., Shekhar, S., Goldstein, T., Bordes, F., Bardes, A., Mialon, G., Tian, Y., Schwarzschild, A., Wilson, A. G., Geiping, J., Garrido, Q., Fernandez, P., Bar, A., Pirsiavash, H., LeCun, Y., &amp; Goldblum, M. (2023, April 24). A cookbook of Self-Supervised Learning. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2304.12210">https://arxiv.org/abs/2304.12210</a></p>
</li>
<li><p>Zadeh, A., Chen, M., Poria, S., Cambria, E., &amp; Morency, L. (2017, July 23). Tensor Fusion Network for Multimodal Sentiment Analysis. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/1707.07250">https://arxiv.org/abs/1707.07250</a></p>
</li>
<li><p>Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., &amp; Isola, P. (2020, May 20). What makes for good views for contrastive learning? <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2005.10243">https://arxiv.org/abs/2005.10243</a></p>
</li>
<li><p>Huang, Y., Du, C., Xue, Z., Chen, X., Zhao, H., &amp; Huang, L. (2021, June 8). What Makes Multi-modal Learning Better than Single (Provably). <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2106.04538">https://arxiv.org/abs/2106.04538</a></p>
</li>
<li><p>Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., &amp; Sun, C. (2021, June 30). Attention bottlenecks for multimodal fusion. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2107.00135">https://arxiv.org/abs/2107.00135</a></p>
</li>
<li><p>Liu, Z., Shen, Y., Lakshminarasimhan, V. B., Liang, P. P., Zadeh, A., &amp; Morency, L. (2018, May 31). <em>Efficient Low-rank Multimodal Fusion with Modality-Specific Factors</em>. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/1806.00064">https://arxiv.org/abs/1806.00064</a></p>
</li>
<li><p>Wang, X., Chen, G., Qian, G., Gao, P., Wei, X., Wang, Y., Tian, Y., &amp; Gao, W. (2023, February 20). <em>Large-scale Multi-Modal Pre-trained Models: A comprehensive survey</em>. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2302.10035">https://arxiv.org/abs/2302.10035</a></p>
</li>
<li><p>Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O. K., Singhal, S., Som, S., &amp; Wei, F. (2022, August 22). Image as a Foreign Language: BEIT Pretraining for all Vision and Vision-Language tasks. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2208.10442">https://arxiv.org/abs/2208.10442</a></p>
</li>
<li><p>Liang, P. P. (2024, April 29). Foundations of multisensory artificial Intelligence. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2404.18976">https://arxiv.org/abs/2404.18976</a></p>
</li>
<li><p>Huang, S., Pareek, A., Seyyedi, S., Banerjee, I., &amp; Lungren, M. P. (2020). Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. Npj Digital Medicine, 3(1). <a target="_blank" href="https://doi.org/10.1038/s41746-020-00341-z">https://doi.org/10.1038/s41746-020-00341-z</a></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[#AISprint Welcome to the Multimodal-verse: A Beginner's Guide]]></title><description><![CDATA[Hey there, weary traveler! Feeling overwhelmed by the AI revolution? Everywhere you look, it's AI this, AI that. And now you're hearing whispers about "multimodal something something." Don't sweat it, my friend. I've got your back!

Let's dive into t...]]></description><link>https://blog.mlnomads.com/multimodal-verse</link><guid isPermaLink="true">https://blog.mlnomads.com/multimodal-verse</guid><category><![CDATA[AISprint ]]></category><category><![CDATA[Multimodal Deep Learning]]></category><category><![CDATA[Multisensory]]></category><category><![CDATA[Multimodality]]></category><category><![CDATA[Deep Learning]]></category><dc:creator><![CDATA[Taha Bouhsine]]></dc:creator><pubDate>Tue, 27 Aug 2024 19:57:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724686911445/b2d3e12c-0d37-49fb-b61b-b1655b292119.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey there, weary traveler! Feeling overwhelmed by the AI revolution? Everywhere you look, it's AI this, AI that. And now you're hearing whispers about "multimodal something something." Don't sweat it, my friend. I've got your back!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724686565064/28960d91-24c1-4743-b8c1-ed874b80cd52.jpeg" alt class="image--center mx-auto" /></p>
<p>Let's dive into the fascinating world of multimodality together. This blog post series is designed to be your friendly guide through the multimodal landscape. We'll keep things simple and beginner-friendly (but you'll need at least some AI 101 under your belt).</p>
<p><a target="_blank" href="https://docs.google.com/presentation/d/10mODeJMw1WPFPy7BaePdbfgpA4dVOAFQKV9akZWmLOg/edit?usp=sharing&amp;resourcekey=0-JsrU7v6W36NnkB6Gh05hKg"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724686760867/31d841f5-d487-4ca1-8528-4a7c1368a52c.jpeg" alt class="image--center mx-auto" /></a></p>
<p>Get ready to expand your human brain with some cool concepts and techniques!</p>
<p>Here's what we'll cover in this series:</p>
<ol>
<li><p><a target="_blank" href="https://blog.mlnomads.com/aisprint-i-intro-to-the-multimodal-verse"><strong>Intro to the Multimodal-Verse</strong></a></p>
<ul>
<li><p>Plato's Cave: A philosophical warm-up</p>
</li>
<li><p>Unimodality and its limitations: Why one just isn't enough</p>
</li>
<li><p>Unique information in different modalities: Spice up your AI life</p>
</li>
</ul>
</li>
<li><p><strong>From Satellite to Earth: A Case Study on Why We Should Go Multimodal</strong></p>
<ul>
<li><p>Satellite bands: More than meets the eye</p>
</li>
<li><p>Night vision with thermal imagery</p>
</li>
</ul>
</li>
<li><p><strong>This is Cool, But Why So Complex?</strong></p>
<ul>
<li><p>Which task are you trying to solve?</p>
</li>
<li><p>The messy world of fusion</p>
</li>
<li><p>Timing is everything: When to fuse?</p>
</li>
<li><p>Fusion techniques: From simple to sophisticated</p>
</li>
</ul>
</li>
<li><p><strong>You've Got My Attention, Where Do I Start?</strong></p>
<ul>
<li><p>Welcome to Flax: Your new best friend</p>
</li>
<li><p>Data loaders, modeling, and training loops</p>
</li>
<li><p>Evaluation, tracking, and model management</p>
</li>
</ul>
</li>
<li><p><strong>Colab 🧡 GitHub Codespaces</strong></p>
</li>
</ol>
<p>Throughout this journey, we'll tackle some common questions and challenges:</p>
<ul>
<li><p>Why is classification so picky about representations?</p>
</li>
<li><p>What's the deal with modality translation and high-quality features?</p>
</li>
<li><p>The great encoder debate: To leak or not to leak?</p>
</li>
<li><p>Fusion timing: Early, mid, or late? (Spoiler: It depends!)</p>
</li>
<li><p>Embedding shenanigans: Size matters, and so does normalization</p>
</li>
<li><p>Fusion techniques: From "concatenate and chill" to "attention fusion and thrill"</p>
</li>
</ul>
<p>So, buckle up, prepare a whole <strong>berrad</strong> of Moroccan tea, and sip on the knowledge I am about to drop on you.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1724685654104/76528831-e30f-4ec2-ba99-157b857530f9.png" alt="a photo by @kaoutharelouraoui" class="image--center mx-auto" /></p>
<h2 id="heading-acknowledgments">Acknowledgments</h2>
<p>Google AI/ML Developer Programs team supported this work by providing Google Cloud Credit.</p>
<h2 id="heading-references">References</h2>
<blockquote>
<p><em>I will try to use the same numbers for citations for the rest of the blogs.</em></p>
</blockquote>
<h3 id="heading-resources">Resources</h3>
<ul>
<li><p>ML Ascent 7 Building your first Multimodal Deep Learning Model with Jax/Flax/Tensorflow Part 1 <a target="_blank" href="https://www.youtube.com/watch?v=ih2A5dfM9gc">Youtube Link</a>, <a target="_blank" href="https://docs.google.com/presentation/d/1_H3fOo50dhfkSdwXr1huNgZvOibX0dA0I2OkT2YRw28/edit?usp=sharing">Slide Deck</a></p>
</li>
<li><p>Salama, K. (2021, January 30). <em>Natural language image search with a dual encoder</em>. Retrieved from <a target="_blank" href="https://keras.io/examples/vision/nl_image_search/">https://keras.io/examples/vision/nl_image_search/</a></p>
</li>
<li><p><a target="_blank" href="https://youtu.be/eMlx5fFNoYc">Attention in transformers, visually explained | Chapter 6, Deep Learning</a></p>
</li>
<li><p>Self-supervised learning: The dark matter of intelligence. (n.d.). <a target="_blank" href="https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/">https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/</a></p>
</li>
</ul>
<h3 id="heading-papers-and-theses">Papers and Theses</h3>
<ol>
<li><p>Le-Khac, P. H., Healy, G., &amp; Smeaton, A. F. (2020). Contrastive representation learning: A framework and review. IEEE Access, 8, 193907–193934.</p>
<p> <a target="_blank" href="https://doi.org/10.1109/ACCESS.2020.3031549">https://doi.org/10.1109/ACCESS.2020.3031549</a></p>
</li>
<li><p>Jia, C., Yang, Y., Xia, Y., Chen, Y., Parekh, Z., Pham, H., Le, Q., Sung, Y., Li, Z., &amp; Duerig, T. (2021). Scaling up Visual and Vision-Language representation learning with noisy text supervision. <em>International Conference on Machine Learning</em>, 4904–4916. <a target="_blank" href="http://proceedings.mlr.press/v139/jia21b/jia21b.pdf">http://proceedings.mlr.press/v139/jia21b/jia21b.pdf</a></p>
</li>
<li><p>Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., &amp; Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv. <a target="_blank" href="https://arxiv.org/abs/2103.00020">https://arxiv.org/abs/2103.00020</a></p>
</li>
<li><p>Zhai, X., Mustafa, B., Kolesnikov, A., &amp; Beyer, L. (2023, October). <em>Sigmoid loss for language image pre-training</em>. In <em>2023 IEEE/CVF International Conference on Computer Vision (ICCV)</em> (pp. 11941-11952). IEEE. <a target="_blank" href="https://doi.org/10.1109/ICCV51070.2023.01100">https://doi.org/10.1109/ICCV51070.2023.01100</a></p>
</li>
<li><p>Li, S., Zhang, L., Wang, Z., Wu, D., Wu, L., Liu, Z., Xia, J., Tan, C., Liu, Y., Sun, B., &amp; Stan Z. Li. (n.d.). Masked modeling for self-supervised representation learning on vision and beyond. In IEEE [Journal-article]. <a target="_blank" href="https://arxiv.org/pdf/2401.00897">https://arxiv.org/pdf/2401.00897</a></p>
</li>
<li><p>Jia, C., Yang, Y., Xia, Y., Chen, Y., Parekh, Z., Pham, H., Le, Q., V., Sung, Y., Li, Z., &amp; Duerig, T. (2021, February 11). Scaling up Visual and Vision-Language representation learning with noisy text supervision. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2102.05918">https://arxiv.org/abs/2102.05918</a></p>
</li>
<li><p>Bachmann, R., Kar, O. F., Mizrahi, D., Garjani, A., Gao, M., Griffiths, D., Hu, J., Dehghan, A., &amp; Zamir, A. (2024, June 13). 4M-21: An Any-to-Any Vision model for tens of tasks and modalities. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2406.09406">https://arxiv.org/abs/2406.09406</a></p>
</li>
<li><p>Bao, H., Dong, L., Piao, S., &amp; Wei, F. (2021, June 15). BEIT: BERT Pre-Training of Image Transformers. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2106.08254">https://arxiv.org/abs/2106.08254</a></p>
</li>
<li><p>Balestriero, R., Ibrahim, M., Sobal, V., Morcos, A., Shekhar, S., Goldstein, T., Bordes, F., Bardes, A., Mialon, G., Tian, Y., Schwarzschild, A., Wilson, A. G., Geiping, J., Garrido, Q., Fernandez, P., Bar, A., Pirsiavash, H., LeCun, Y., &amp; Goldblum, M. (2023, April 24). A cookbook of Self-Supervised Learning. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2304.12210">https://arxiv.org/abs/2304.12210</a></p>
</li>
<li><p>Zadeh, A., Chen, M., Poria, S., Cambria, E., &amp; Morency, L. (2017, July 23). Tensor Fusion Network for Multimodal Sentiment Analysis. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/1707.07250">https://arxiv.org/abs/1707.07250</a></p>
</li>
<li><p>Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., &amp; Isola, P. (2020, May 20). What makes for good views for contrastive learning? <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2005.10243">https://arxiv.org/abs/2005.10243</a></p>
</li>
<li><p>Huang, Y., Du, C., Xue, Z., Chen, X., Zhao, H., &amp; Huang, L. (2021, June 8). What Makes Multi-modal Learning Better than Single (Provably). <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2106.04538">https://arxiv.org/abs/2106.04538</a></p>
</li>
<li><p>Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., &amp; Sun, C. (2021, June 30). Attention bottlenecks for multimodal fusion. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2107.00135">https://arxiv.org/abs/2107.00135</a></p>
</li>
<li><p>Liu, Z., Shen, Y., Lakshminarasimhan, V. B., Liang, P. P., Zadeh, A., &amp; Morency, L. (2018, May 31). <em>Efficient Low-rank Multimodal Fusion with Modality-Specific Factors</em>. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/1806.00064">https://arxiv.org/abs/1806.00064</a></p>
</li>
<li><p>Wang, X., Chen, G., Qian, G., Gao, P., Wei, X., Wang, Y., Tian, Y., &amp; Gao, W. (2023, February 20). <em>Large-scale Multi-Modal Pre-trained Models: A comprehensive survey</em>. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2302.10035">https://arxiv.org/abs/2302.10035</a></p>
</li>
<li><p>Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O. K., Singhal, S., Som, S., &amp; Wei, F. (2022, August 22). Image as a Foreign Language: BEIT Pretraining for all Vision and Vision-Language tasks. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2208.10442">https://arxiv.org/abs/2208.10442</a></p>
</li>
<li><p>Liang, P. P. (2024, April 29). Foundations of multisensory artificial Intelligence. <a target="_blank" href="http://arXiv.org">arXiv.org</a>. <a target="_blank" href="https://arxiv.org/abs/2404.18976">https://arxiv.org/abs/2404.18976</a></p>
</li>
<li><p>Huang, S., Pareek, A., Seyyedi, S., Banerjee, I., &amp; Lungren, M. P. (2020). Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. Npj Digital Medicine, 3(1). <a target="_blank" href="https://doi.org/10.1038/s41746-020-00341-z">https://doi.org/10.1038/s41746-020-00341-z</a></p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[كيفاش نطلع بروجي  من زيرو تال ارتيكل | فيرمة  د البنان كمثال]]></title><description><![CDATA[💡
MLNomads Mentorship Program | Session 1



ML Descent 1: Engineering in Machine Learning Engineering


يات س يات
يقال أن فهم السؤال هو نصف الجواب, وباش تطلع بروجي ناضي هو تكون عندك وضعية مشكلة واضحة و مفهومة, و تكون مبلاني لكاع الخطوات و التفاصيل ...]]></description><link>https://blog.mlnomads.com/ml-engineering-yat-s-yat</link><guid isPermaLink="true">https://blog.mlnomads.com/ml-engineering-yat-s-yat</guid><category><![CDATA[mlnomads]]></category><category><![CDATA[ML]]></category><category><![CDATA[engineering]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[Morocco ]]></category><dc:creator><![CDATA[Elhoussaine O.]]></dc:creator><pubDate>Sun, 28 Apr 2024 13:49:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1711068282343/54ee5d14-6d5c-4f48-a3d3-e0ede3103b41.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text"><a target="_blank" href="http://www.mlnomads.com"><strong>MLNomads</strong></a><strong> Mentorship Program | Session 1</strong></div>
</div>

<blockquote>
<p>ML Descent 1: Engineering in Machine Learning Engineering</p>
</blockquote>
<hr />
<h1 id="heading-yat-s-yat">يات س يات</h1>
<p>يقال أن فهم السؤال هو نصف الجواب, وباش تطلع بروجي ناضي هو تكون عندك <strong>وضعية مشكلة</strong> واضحة و مفهومة, و تكون مبلاني لكاع الخطوات و التفاصيل باش اصدق ليك البروجي.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711067212307/3d9e4c3e-9bac-4c65-9d18-3236ff88e273.jpeg" alt class="image--center mx-auto" /></p>
<p>فهاد البوسط  غادي نحاولو نبسطو الخطوات اللي الواحد اتبعهوم باش اخدم بروجي د <strong><em>تعلم الالة</em></strong> اللي من خلالو غادي القى حل ل شي مشكل اعتمادا على <strong><em>البيانات</em></strong><br /><strong>Research paper</strong> انطلاقا من مرحلة <strong>تحديد المشكل</strong> و التعريف ديالو تال المرحلة د كتابة <strong>ورقة بحثية</strong></p>
<h1 id="heading-thdyd-ao-taaryf-almshkl"><strong>تحديد او تعريف المشكل</strong></h1>
<h2 id="heading-shno-bghyna">شنو بغينا؟</h2>
<p>اول خطوة من البروجي ديالنا هي نحددو المشكل او الوضعية المشكلة  لي عندنا, و اللي على أساسها غادي نعرفو الخطوات اللي نتبعوها باش نقدو نلقاو حلول ليها.<br />كلما عرفنا المشكل و حددنا التفاصيل ديالو كلما غادي يكون ف الاستطاعة ديالنا نخططو بشكل استباقي لاي اضافة او تغيير حسب المتطلبات ديالنا, كمثال توظيف اوادخال <strong>الذكاء الاصطناعي أو التعلم الآلي</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710453339797/04fcef12-e690-4d4d-ba24-24b0c3a9a7ce.jpeg" alt class="image--center mx-auto" /></p>
<p><em>فالحالة ديالنا, الجني د البنان كيتطلب يد عاملة مهمة اللي كاتقوم ب مهمة مراقبة اﻷشجار د البنان باش تعرف اللي طايبة و تجمعها. و لكن هاد المهمة كاتكون مكلفة من الناحية المادية خصوصا لا كانت عندنا شي فيرمة لي كبيييرة و حتى من ناحية الوقت و العدد د العمال لي غا تحتاج. بلى مانساو الجودة د الثمار د البنان: كلما تعطلنا باش نعرفو واش طابت كلما كايبداو افقدو الجودة ديالهوم اكيبداو اخسرو.</em></p>
<p><strong>كحل لهاد المعضلة بغينا نصاوبو واحد الموديل ديال الإي آي لي غادي اسهل علينا هاد المهمة او انقص لينا من التكلفة ديالها</strong><br />ولكن كيفاش؟<br /><em>الحل بسيط:</em> فبلاصة المعاينة اليدوية من طرف اليد العاملة غادي نبقاو <strong>ناخدو تصاور ديال البنان و نبقاو نصنفوهوم بشكل آلي (سيستيم), وبمجرد القى اللي طايبة غادي اعلمنا او نمشيو نجمعوها.</strong></p>
<h2 id="heading-shno-dar-mn-kbl">شنو دار من قبل ؟</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710451098185/c7e421ba-58f5-4b01-830b-fdb69d68096b.jpeg" alt="كي غاندير ليها؟" class="image--center mx-auto" /></p>
<blockquote>
<p><strong><em>كي غاندير ليييهااا؟؟</em></strong></p>
</blockquote>
<p>باش نعرفو كيفاش نحلو المشكلة ديالنا, و لابد خاصنا نشوفو شنو دار من قبل ف الأوراق البحثية اللي تنشرات سواءا في نفس .الموضوع اولا ف مواضيع لي قريبة ليه<br />.هاكا غادي اكون عندنا اطلاع على شنو توصل ليه البحث العلمي, اللي نقدو نبنيو عليه او نستوحيو منو الحلول ديالنا<br />و افضل طريقة لمقارنة هاد الحلول الي ديجا كاينا هو عن طريق جدول تتكون فيه هاد المعلومات على كل ورقة بحثية:</p>
<ul>
<li><p><strong>العنوان</strong></p>
</li>
<li><p><strong>السنة د النشر</strong></p>
</li>
<li><p><strong>جورنال أو كونفيرونص فاش تنشرات</strong></p>
</li>
<li><p><strong>الكلمات المفتاحية</strong></p>
</li>
<li><p><strong>الملخص Abstract</strong></p>
</li>
<li><p><strong>واش صالحة ليا او لا لا</strong></p>
</li>
<li><p>...</p>
</li>
</ul>
<h1 id="heading-tatyr-almshkl">تأطير المشكل</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711151608584/c0edce83-4fd9-448e-90c2-68d0cb645780.jpeg" alt class="image--center mx-auto" /></p>
<p><strong><em>ها المعقول بدا</em></strong></p>
<p>فهاد المرحلة غادي نحاولو نأطرو المشكلة ديالنا, من خلال نتطرقو للجوانب اللي كاتعلق خاصة بالبيانات الي غانحتاجو و الاستراتيجية باش غاخدمو عليهوم او إلى اي حد صالحة</p>
<h3 id="heading-input-almdkhlat"><strong>Input المدخلات</strong></h3>
<p>كاتشكل البيانات لي غادي نعطيوها للموديل ديالنا باش اتعلم منها, ولكن كيفاش نقدو نجمعوهم و شنو الطبيعة ديالهوم<br />واش صور , واش معطيات رقمية و لا نصية</p>
<p><em>كيفاش نجمعو الداتا؟</em></p>
<p><strong><em>من الطرق اللي نقدو نحصلو بيها على الداتا اللي محتاجينها ف البروجي ديالنا</em></strong></p>
<ul>
<li><p><strong>البحث عن البيانات اللي ديجا واجدين او خدمو عليهوم الناس<br />  نقدو نلقاوهوم ف</strong></p>
<ul>
<li><p><a target="_blank" href="http://www.kaggle.com/datasets"><em>Kaggle</em></a></p>
</li>
<li><p><a target="_blank" href="https://archive.ics.uci.edu/datasets"><strong><em>UCI Machine Learning Repository</em></strong></a></p>
</li>
<li><p><strong><em>...</em></strong></p>
</li>
</ul>
</li>
<li><p><strong>نجمعوالداتا انطلاقا من الويب</strong></p>
<ul>
<li><p><strong>YouTube, Social Media, ...</strong> ناخدو دي سكرينشوت من لي فيديو</p>
</li>
<li><p><strong>Data scrapping*</strong>نسكرابيو البيانات من مواقع الويب أو من منصات نشر الصور*</p>
</li>
</ul>
</li>
<li><p><strong>الجمع اليدوي</strong><br />  كنلتاجؤو ليها في آخر المطاف لا ماقديناش نلقاوالداتا بالطرق اللي سبقو<br />  و لكن غادي تحتاج منا الوقت و الماتريال باش نجمعوها<br />  و نكونو عارفي بالظبط شنو بغينا أو شنو نجمعو عن طريق</p>
<ul>
<li><p><strong><em>Sensors</em></strong> <em>مستشعرات</em></p>
</li>
<li><p><strong><em>RGB</em></strong> <em>صور</em></p>
</li>
<li><p><strong><em>Satellite Imagery</em></strong> <em>صور جوية</em></p>
</li>
<li><p>...</p>
</li>
</ul>
</li>
</ul>
<p>فالحالة ديالنا غادي نجمعو <strong>تصاور د البنان بكاميرات اللي فيكس ف بلاصتهوم</strong>: أقل تكلفة مقارنة بالمستشعرات و أكثر جودة و دقة من الصور الجوية<br />و لكن غادي تطلب منا <strong>بزافالوقت باش نجمعوها ف الفيرمة</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1710462600438/17f161fb-63ab-438f-acef-df9bf55321b2.jpeg" alt class="image--center mx-auto" /></p>
<p><strong><em>كبديل غا نحاولو نجمعو الصور د البنان من الويب</em></strong></p>
<h3 id="heading-output-almkhrgat"><strong>Output المخرجات</strong></h3>
<p>هي الحوايج اللي بغينا اتعلمهوم الموديل ديالنا من الداتا الي عطيناه, و كيفاش ايكونو (قيم مستمرة, قيم ثابتة, ...)</p>
<ul>
<li><p>واش البنان طايب اولا لا</p>
</li>
<li><p>جودة البنان واش مزيان اولا خضر اولا خامج</p>
</li>
<li><p>شناهي الدرجة د الطيوبية ديال البنان</p>
</li>
<li><p>...</p>
</li>
</ul>
<p>ف هاد الحالة الداتا الي غانجمعو غادي تكون على شكل <strong><em>3 أصناف</em></strong> كاتبين الجودة د ثمار البنان</p>
<ul>
<li><p><strong>بنان طايب (أصفر)</strong></p>
</li>
<li><p><strong>بنان خضر</strong></p>
</li>
<li><p><strong>بنان خامج</strong></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711152793954/79f1ed45-88fb-4b2e-bc03-793699825263.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-alastratygya-ao-mkarba-alaaml"><strong>الاستراتيجية او مقاربة العمل</strong></h3>
<p>هي الطريقة او الوسيلة اللي غادي نعتامدو عليها باش نصاوبو الموديل ديالنا او نعلموه القى لينا الحل</p>
<p>ولكل طريقة غادي اخصنا نشوفو إلى اي حد صالحة, شنو ناخدو بعين الاعتبار أو الحدود ديالها</p>
<p><strong><em>من بين هاد الطرق</em></strong></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🚨</div>
<div data-node-type="callout-text">(غادي نذكرو <em>خفاً زرباً</em> بعض الطرق و ما غاندخلوش ف التفاصيل (خاصنا بوصط على كل وحدة ههههه</div>
</div>

<ul>
<li><p><strong>Deep learning / CNN - التعلم العميق</strong></p>
<ul>
<li><p>ماشي ديما كاين عندنا GPU</p>
</li>
<li><p>شحال د الوقت ايبقا ياخد لينا الموديل باش اعطينا النتيجة</p>
</li>
<li><p>كلما كانت الداتا قليلة و ماشي جينيراليزي ايولي غي كايحفظ</p>
</li>
</ul>
</li>
<li><p><strong>Classic ML: SVM, ... - التعلم الآلي التقليدي</strong></p>
<ul>
<li>محدود من حيث القدرة ديالو على التعلم و القدرة على التعميم على الداتا لي ماشافهاش</li>
</ul>
</li>
<li><p><strong>Transfer learning - نقل التعلم</strong></p>
<ul>
<li>اقد ازيد لينا البروي ف الموديل الشيء لي غايأثر على النتيجة اللي بغينا و الأداء د الموديل ديالنا</li>
</ul>
</li>
<li><p><strong>Self-supervised learning - التعلم الذاتي</strong></p>
<ul>
<li><p>هنا بقينا لابغينا نعلمو الموديل ب شي داتا كبيييييرة</p>
</li>
<li><p>غايخصنا الوقت و الموارد التقنية</p>
</li>
</ul>
</li>
<li><p><strong>Handmade methods - (البريكولاج) طرق خاصة</strong></p>
<ul>
<li><p>تقد تصدق لينا فالمشاكل البساط</p>
</li>
<li><p>و لكن كلما تعقد البروبليم اوكبرات الداتا راه ماتبقاش خدامة</p>
</li>
</ul>
</li>
</ul>
<p>و باش ناخدو فكرة اللي تقد تسهل علينا الاستراتيجية باش نخدمو , نقدو نجربو بشكل مبسط و مصغر بعض الطرق و نقارنو النتائج الأولية ديالها</p>
<p><strong>تجربة:</strong><a target="_blank" href="https://teachablemachine.withgoogle.com/models/QP6Q7SmTl/">موديل كايصنف البنان</a></p>
<h1 id="heading-alntag">النتائج</h1>
<p>النتائج هي اللي كاتبين الخدمة اللي دارت<br />الى ماعرفناش انقدموها كما ينبغي فغنضيعو علينا كولشي</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711155010761/fd874bd9-cbdb-4a3a-96b2-4c7139e43800.jpeg" alt class="image--center mx-auto" /></p>
<p><strong>عليها خاص اتعطاها حقها او نبينوها كما ينبغي و نتأكدو من الصحة ديالها</strong></p>
<p><strong><em>كيفاش؟</em></strong></p>
<ul>
<li><p><strong>(accuracy, loss, ...) نستعملو المقاييس المناسبة</strong><br />  باش نبينو اﻷداء و الفعالية ديال الموديل</p>
</li>
<li><p><strong>(non technical) نقادو دي بريزونطاسيون موجهة للزبائن</strong><br />  نشرحو من خلالها و نبينو القيمة المضافة ديال الخدمة والحلول اللي تلقاو</p>
</li>
<li><p><strong>(deployment) نوجدو لي دوكيمونطاسيون د الكود اوالتقارير التقنية</strong></p>
</li>
<li><p><strong>(research paper) تقرير البحث</strong></p>
<p>  ... أو الورقة البحثية اللي كانقدمو فيها العمل ديالنا بطابع أكاديمي أو ننشروها للعموم ف جورنال أو كونفيرونس</p>
</li>
</ul>
<h1 id="heading-altkryr">التقرير</h1>
<p>وصلنا لآخر مرحلة اللي هي التقرير, اللي من خلالو كانقدمو الخدمة اللي درنا و النتائج اللي توصلنا ليها او كي درنا ليها<br />نقدو نلخصو المحتوى د التقرير ف النقط التالية <strong><em>بعجالة</em></strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711155350377/b98a9405-9607-48b5-b8c6-5d06f813fc4d.jpeg" alt class="image--center mx-auto" /></p>
<ul>
<li><p><strong>المقدمة</strong></p>
<ul>
<li><p>تمهيد للوضعية المشكلة</p>
</li>
<li><p>ملخص مقتضب على استراتيجية العمل, النتائج و القيمة المضافة للعمل</p>
</li>
</ul>
</li>
<li><p><strong>ٍState of art شنو اللي دار من قبل</strong></p>
<ul>
<li>شنو دارو الناس قبل باش احلو نفس المشكل أو المشاكل اللي مشابهة</li>
</ul>
</li>
<li><p><strong>Methods الطريقة</strong></p>
<ul>
<li><p><strong>Data</strong> البيانات المستعملة</p>
</li>
<li><p>الموديل اللي استعملنا: كيفاش علمناه و التفاصيل ديالو</p>
</li>
<li><p>التفاصيل التجريبية: الماتريال و تفاصيل التعلم</p>
</li>
</ul>
</li>
<li><p><strong>Results &amp; discussion النتائج و المناقشة</strong></p>
<ul>
<li><p>نعرضو النتائج اللي تحصلنا عليها: جداول, مبيانات, مقاييس</p>
</li>
<li><p>نناقشو من خلال مقارنة النتائج و التفسير د الصعوبات اللي واجهات الموديل</p>
</li>
</ul>
</li>
<li><p><strong>الخاتمة</strong></p>
<ul>
<li><p>شنو درنا و لاش وصلنا</p>
</li>
<li><p>الطموحات ديالنا: شنو بغينا نديرو مستقبلا</p>
</li>
</ul>
</li>
</ul>
<h1 id="heading-nkhtmo-afla-oaoal">نختمو: افلا واوال</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711066999405/3807a9ae-296a-47f9-b038-8fc98102dcc6.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>"<strong>يات س يات اورد يات ف يات</strong>"</p>
</blockquote>
<p>كلما حددنا المشكل اللي عندنا و عرفنا كاع التفاصيل ديالو كلما سهالت علينا القضية باش نعرفو الخطوات اللي نتبعو باش نحلوه</p>
<p><strong><em>نتلاقاو ف بوست جديد على موضوع جديد</em></strong></p>
<hr />
<p><a target="_blank" href="https://teachablemachine.withgoogle.com/models/QP6Q7SmTl/"><strong><em>مصادر الصور</em></strong></a></p>
<ul>
<li><p><a target="_blank" href="https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94">https://towardsdatascience.com/workflow-of-a-machine-learning-project-ec1dba419b94</a></p>
</li>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/Walid_Cheddira">https://en.wikipedia.org/wiki/Walid_Cheddira</a></p>
</li>
<li><p><a target="_blank" href="https://en.wikipedia.org/wiki/File:Bananal_REFON.jpg">https://en.wikipedia.org/wiki/File:Bananal_REFON.jpg</a></p>
</li>
<li><p><a target="_blank" href="https://www.pinterest.com/">https://www.pinterest.com/</a></p>
</li>
</ul>
]]></content:encoded></item></channel></rss>