OneLake
6 TopicsData security controls in OneLake
Unify and secure your data — no matter where it lives — without sacrificing control using OneLake security, part of Microsoft Fabric. With granular permissions down to the row, column, and table level, you can confidently manage access across engines like Power BI, Spark, and T-SQL, all from one place. Discover, label, and govern your data with clarity using the integrated OneLake catalog that surfaces the right items fast. Aaron Merrill, Microsoft Fabric Principal Program Manager, shows how you can stay in control, from security to discoverability — owning, sharing, and protecting data on your terms. Protect sensitive information at scale. Set precise data access rules — down to individual rows. Check out OneLake security in Microsoft Fabric. No data duplication needed. Hide sensitive columns while still allowing access to relevant data. See it here with OneLake security. Built-in compliance insights. Streamline discovery, governance, and sharing. Get started with the OneLake catalog. QUICK LINKS: 00:00 — OneLake & Microsoft Fabric core concepts 01:28 — Table level security 02:11 — Column level security 03:06 — Power BI report 03:28 — Row level security 04:23 — Data classification options 05:19 — OneLake catalog 06:22 — View and manage data 06:48 — Governance 07:36 — Microsoft Fabric integration 07:59 — Wrap up Link References Check out our blog at https://aka.ms/OneLakeSecurity Sign up for a 60-day free trial at https://fabric.microsoft.com Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: -As you build AI and analytic workloads, unifying your data from wherever it lives and making it accessible doesn’t have to come at the cost of security. In fact, today we dive deeper into Microsoft’s approach to data unification, accessibility, and security with OneLake, part of Microsoft Fabric, where we’ll focus on OneLake’s security control set and how it compliments data discovery via the new OneLake catalog. -Now, in case you’re new to OneLake and Microsoft Fabric, I’ll start by explaining a few core concepts. OneLake is the logical multi-cloud data lake that is foundational to Microsoft Fabric, Microsoft’s fully managed data analytics and AI platform. OneLake, with its support for open data formats, provides a single and unified place across your entire company for data to be discovered, accessed, and controlled across your data estate. Data can reside anywhere, and you can connect to it using shortcuts or via mirroring. And once in OneLake, you have a single place where data can be centrally classified and labeled as the basis for policy controls. You can then configure granular, role-based permissions that can apply down to the folder level for unstructured data and by table for structured data. -Then all the way down to the column and row levels within each table. This way, security is enforced across all connected data. Meaning that whether you’re accessing the data through Spark, Power BI, T-SQL, or any other engine, it’s protected and you have the controls to allow or limit access to data on your terms. In fact, let me show you a few examples for enforcing OneLake security at all of these levels. I’ll start with an example showing OneLake security at the table level. I want to grant our suppliers team access to a specific table in this lakehouse. I’ll create a OneLake security role to do that. So I’ll just give it a name, SuppliersReaders. Then I’ll choose selected data and find the table that I want to share by expanding the table list, pick suppliers and then confirm. -Now, I just need to assign the right users. I’ll just add Mona in this case, and create the role. Then if I move over to Mona’s experience, I can run queries against the supplier data in the SQL endpoint. But if I try to query any other table, I’m blocked, as you can see here. Now, let me show you another option. This time, I’ll lock access down to the column level. I want to grant our customer relations team access to the data they need, but I don’t want to give them access to PII data. Using OneLake security controls, I can create a role that restricts access to sensitive columns. Like before, I’ll name it. Then I need to select my data. This time, I’ll choose three different tables for customer and order data. But notice this grayed out legacy orders table here that we would like to apply column security to as well. I don’t own the permissions for this table because it’s a shortcut to other data. However, the owner of that data can grant permission to it using the steps I’ll show next. From the role I just created, I’ll expand on my tables. And for the customer’s table, I’ll enable column security. Once I confirm, I can select the columns I want to remove and that we don’t want them to see and save it. -Now, let’s look at the results of this from another engine, Power BI, while building a report. I’ll choose a semantic model for my Power BI report. With the column level security in place, notice the sensitive columns I removed before, contact name and address, are hidden from me. And when I expand the legacy orders table, which was a shortcut, it’s also not showing PII columns. Now, some scenarios require that security controls are applied where records might be interspersed with the same table, so a row level filter is needed. For example, our US-based HR team should only see data for US-based employees. I’ve created another security role with the right data selected, HRUS. -Now, I’ll move to my tables and choose from the options for this employee’s table and I’ll select row security. Row level security in OneLake uses SQL statements to limit what people can see. I’ll do that here with a simple select statement to limit country to USA. Now, from the HR team’s perspective, they can start to query the data using another engine, Spark, to analyze employer retention. But only across US based employees, as you can see from the country column. And as mentioned, this applies to all engines, no matter how you access it, including the Parquet files directly in OneLake. Next, let’s move on to data classification options that can be used to inform policy controls. Here, the good news is the same labels you’ve defined in Microsoft Purview for your organization used in Microsoft 365 for emails, messaging, files, sites, and meetings can be applied to data items in OneLake. -Additionally, Microsoft Purview policy controls can be used to automatically label content in OneLake. And another benefit I can show you from the lineage view is label inheritance. Notice this Lakehouse is labeled Non-Business, as is NorthwindTest, but look at the connected data items on the right of NorthwindTest. They are also non-business. If I move into the test lakehouse and apply a label either automatically or manually to my data, like I’m doing here, then I move back to the lineage view. My downstream data items like this model and the SQL analytics endpoint below it have automatically inherited the upstream label. -So now we’ve explored OneLake security controls, their implementation, and enforcement, let’s look at how this works hand in hand with the OneLake catalog for data discovery and management. First, to know that you’re in the right place, you can use branded domains to organize collections of data. I’ll choose the sales domain. To get the data I want, I can see my items as the ones I own, endorsed items, and my favorites. I can filter by workspace. And on top, I can select the type of data item that I’m looking for. Then if I move over to tags, I can find ones associated with cost centers, dates, or other collection types. -Now, let’s take a look at a data item. This shows me more detail, like the owner and location. I can also see table schemas and more below. I can preview data within the tables directly from here. Then using the lineage tab, it shows me a list of connected and related items. Lastly, the monitor tab lets me track data refresh history. Now, let me show you how as a data owner you can view and manage these data items. From the settings of this lakehouse, I can change its properties and metadata, such as the endorsement or update the sensitivity label. And as the data owner, I can also share it securely internally or even externally with approved recipients. I’ll choose a colleague, [email protected], and share it. -Next, the govern tab in the OneLake catalog gives you even more control as a data owner, as well as recommendations to make data more secure and compliant. You’ll find it on the OneLake catalog main page. This gives me key insights at a glance, like the number and type of items I own. And when I click into view more, I see additional information like my data hierarchy. Below that, item inventory and data refresh status. Sensitivity label coverage gives me an idea of how compliant my data items are. And I can assess data completeness based on whether an item is properly tagged, described, and endorsed across the items I own. Back on the main view, I can see governance actions tailored specifically to my data, like increasing sensitivity label, coverage, and more. -The OneLake catalog is integrated across Microsoft Fabric experiences to help people quickly discover the items they need. And it’s also integrated with your favorite Office apps, including Microsoft Excel, where you can use the get data control to select and access data in OneLake. And right in context, without leaving the app, you can define what you want and pull it directly into your Excel file for analysis. The OneLake catalog is the one place where you can discover the data that you want and manage the data that you own. And combined with OneLake security controls, you can do all of this without increasing your data security risks. -To find out more and get started, check out our blog at aka.ms/OneLakeSecurity. Also, be sure to sign up for a 60 day free trial at fabric.microsoft.com. And keep watching Mechanics for the latest updates across Microsoft, subscribe to our channel, and thanks for watching.201Views0likes0CommentsMGDC for SharePoint FAQ: How to flatten datasets for SQL or Fabric
When you get your data from Microsoft Graph Data Connect (MGDC), you will typically get that data as a collection of JSON objects in an Azure Data Lake Storage (ADLS) Gen2 storage account. For those handling large datasets, it might be useful to move the data to a SQL Server or to OneLake (lakehouse). In those cases, you might need to flatten the datasets. This post describes how to do that. If you’re not familiar with MGDC for SharePoint, start with https://aka.ms/SharePointData. 1. Flattening Most of the MGDC for SharePoint datasets come with nested objects. That means that a certain object has other objects inside it. For instance, if you have a SharePoint Groups object, it might have multiple Group Members inside. If you have a SharePoint Permissions object, you could have many Permissions Recipients (also known as Sharees). For each SharePoint File object, you will have a single Author object inside. When you convert the datasets from JSON to other formats, it is possible that these other formats require (or perform better) if you don’t have any objects inside objects. To overcome that, you can turn those child objects into properties of the parent object. For instance, instead of having the File object with an Author object inside, you can have multiple author-related columns. For instance, you could have Author.Name and Author.Email as properties of the flattened File object. 2. Nested Objects You can get the full list of SharePoint datasets in MGDC at https://aka.ms/SharePointDatasets. Here is a table with a list of objects and their nested objects: Object How many? Primary Key Nested Object How many? Add to Primary Key Sites 1 per Site Id RootWeb 1 per Site Sites 1 per Site Id StorageMetrics 1 per Site Sites 1 per Site Id SensitivityLabelInfo 1 per Site Sites 1 per Site Id Owner 1 per Site Sites 1 per Site Id SecondaryContact 1 per Site Groups 1 per Group SiteId + GroupId Owner 1 per Group Groups 1 per Group SiteId + GroupId Members 1 per Member COALESCE(AADObjectId, Email, Name) Permissions 1 per Permission SiteId + ScopeId + RoleDefintion + LinkId SharedWithCount 1 per Recipient Type Type Permissions 1 per Permission SiteId + ScopeId + RoleDefintion + LinkId SharedWith 1 per Recipient or Sharee COALESCE(AADObjectId, Email, Name) Files 1 per File SiteId + WebId + ListId + ItemId Author 1 per File Files 1 per File SiteId + WebId + ListId + ItemId ModifiedBy 1 per File When you flatten a dataset and there is an object with multiple objects inside (like Group Members or Permission Recipients), the number of rows will increase. You also need to add to primary key to keep it unique. Also note that the File Actions, Sync Health and Sync Errors datasets do not have any nested objects. 3. One Object per Parent When the nested object has only one instance, things are simple. As we described for the Author nested object inside the File object, you promote the properties of the nested object to be properties of the parent object. This is because the Author is defined as the user that initially created the file. There is always one and only one Author. This can happen even happen multiple times for the same object. The File also has a ModifiedBy property. That is the single user that last changed the file. In that case, there is also only one ModifiedBy per File. The Site object also includes several properties in this style, like RootWeb, StorageMetrics, SensitivityLabelInfo, Owner and SecondaryContact. Note that, in the context of the Site object, there is only one owner. Actually two, but that second one is tracked in a separate object called SecondaryContact which is effectively the secondary owner. 4. Multiple Objects per Parent The SharePoint Permissions dataset has a special condition that might create trouble for flattening. There are two sets of nested objects with multiple objects each: SharedWith and SharedWithCount. SharedWith has the list of Recipients and SharedWithCount has a list of Recipient Types. If you just let the tools flatten it, you will end up a cross join of the two. As an example, if you have 4 recipients in an object and 2 types of recipients (internal users and external users, for instance) you will end up with 20 objects in the flattened dataset instead of the expected 10 objects (one per recipient). To avoid this, in this specific condition, I would recommend just excluding the SharedWithCount column from the object before flattening. 5. Conclusion I hope this clarifies how you can flatten the MGDC for SharePoint datasets, particularly SharePoint Permissions dataset. For further details about the MGDC for SharePoint, https://aka.ms/SharePointData.Advanced Time Series Anomaly Detector in Fabric
Anomaly Detector, one of Azure AI services, enables you to monitor and detect anomalies in your time series data. This service is being retired by October 2026, and as part of the migration process the anomaly detection algorithms were open sourced and published by a new Python package and we offer a time series anomaly detection workflow in Microsoft Fabric data platform.2.5KViews2likes0CommentsUnderstanding OneLake Architecture: The OneDrive for Data
Learn how OneLake simplifies data engineering. Data engineers face many difficulties every day. Data sources are diverse and fragmented, containing different file types and data quality levels. Finding specific files, determining their owners and access permissions, can be frustrating. OneLake helps you overcome these challenges.6.6KViews2likes2Comments