Building Machine Learning Models in Microsoft Fabric
22 Minuten
Podcast
Podcaster
M365 Show brings you expert insights, news, and strategies across Power Platform, Azure, Security, Data, and Collaboration in the Microsoft ecosystem.
Beschreibung
vor 3 Monaten
Ever wondered why your machine learning models just aren’t moving
fast enough from laptop to production? You’re staring at data in
your Lakehouse, notebooks are everywhere, and you’re still
coordinating models on spreadsheets. What if there’s a way to
finally connect all those dots—inside Microsoft Fabric? Stay with
me. Today, we’ll walk through how Fabric’s data science workspace
hands you a systems-level ML toolkit: real-time Lakehouse access,
streamlined Python notebooks, and full-blown model tracking—all
in one pane. Ready to see where the friction drops out?
From Data Swamp to Lakehouse: Taming Input Chaos
If you’ve ever tried to start a machine learning project and felt
like you were just chasing down files, you’re definitely not
alone. This is the part nobody romanticizes—hunting through cloud
shares, email chains, and legacy SQL databases, just to scrape
enough rows together for a pass at training. The reality is,
before you ever get to modeling, most of your time goes to what
can only be described as custodial work. And yet, it’s the single
biggest reason most teams never even make it out of the
gate.Let’s not sugarcoat it: data is almost never where you want
it. You end up stuck in the most tedious scavenger hunt you never
signed up for, just to load up raw features. Exporting CSVs from
one tool, connecting to different APIs in another, and then
piecing everything together in Power BI—or, if you’re lucky,
getting half the spreadsheet over email after some colleague
remembered to hit “send.” By the time you think you’re ready for
modeling, you’ve got the version nobody trusts and half a dozen
lingering questions about what, exactly, that column
“updated_date” really means.It’s supposed to be smoother with
modern cloud platforms, right? But even after your data’s “in the
cloud,” it somehow ends up scattered. Files sit in Data Lake
Gen2, queries in Synapse, reports in Excel Online, and you’re
toggling permissions on each, trying to keep track of where the
truth lives. Every step creates a risk that something leaks out,
gets accidentally overwritten, or just goes stale. Anyone who’s
lost a few days to tracking down which environment is the real
one knows—there’s a point where the tools themselves get in the
way just as much as the bureaucracy.That’s not even the worst of
it. The real showstopper is when it’s time to build actual
features, and you realize you don’t have access to the columns
you need. Maybe the support requests data is owned by a different
team, or finance isn’t comfortable sharing transaction details
except as redacted monthly summaries. So now, you’re juggling
permissions and audit logs—one more layer of friction before you
can even test an idea. It’s a problem that compounds fast. Each
workaround, each exported copy, becomes a liability. That’s
usually when someone jokes about “building a house with bricks
delivered to five different cities,” and at this point, it barely
sounds like a joke.Microsoft Fabric’s Lakehouse shakes that
expectation up. Ignore the buzzword bingo for a minute—Lakehouse
is less about catching up with trends and more about
infrastructure actually working for you. Instead of twelve
different data puddles, you’ve got one spot. Raw data lives
alongside cleaned, curated tables, with structure and governance
built in as part of the setup. For once, you don’t need a data
engineer just to find your starting point. Even business
analysts—not just your dev team with the right keys—are able to
preview, analyze, and combine live data, all through the same
central workspace.Picture this: a business analyst wants to
compare live sales, recent support interactions, and inventory.
They go into the Lakehouse workspace in Fabric, pull the current
transactions over, blend in recent tickets, all while skipping
the usual back-and-forth with IT. There are no frantic requests
to unblock a folder or approve that last-minute API call. The
analyst gets the view they need, on demand, and nothing has to be
passed around through shadow copies or side channels.The security
story is bigger than it looks, too. Instead of gluing together
role-based access from five different systems—or worse, trusting
everyone just to delete their old copies—permissions sit right at
the data workspace level. If you need only sales, you get just
sales. If someone on a different team needs to reference
inventory, they see what’s necessary, and nothing else. There’s
no need for late-night audits or accidental oversharing sent out
in another email blast. This kind of granular control has teeth,
and it means the system is finally working for you—not just IT
compliance officers.Most ML tools promise “easy access” but
Fabric’s Lakehouse sets it up for both sides: technical users
dive straight into raw data, analysts use the same space with
understandable, curated views. It eliminates most of those
arguments about “missing context,” and it’s the first time both
sides can operate in parallel without running into each other.
Suddenly, feeding a model isn’t a six-step puzzle—it’s just
picking from well-organized inputs.Now, don’t get me
wrong—centralizing your data feels like arriving at the party,
but you’re still far from launching an ML model. Taming your
input chaos only lines up the dominoes. You know exactly who has
access, you’ve finally got a data system everyone can agree on,
and everything’s ready to pipe into your next experiment. But
here’s the sticking point: even with this head start, most teams
hit friction when it’s time to move from wrangled data to actual
feature engineering and model-building. Notebooks are supposed to
be the on-ramp, but more often, they become their own maze of
version conflicts, broken environments, and lost progress.So with
your inputs sorted—both secure and actually usable—you’d expect
things to speed up. But what’s supposed to be the easy part,
collaborating in notebooks, still brings plenty of pain. Why do
so many projects stall when all you’re trying to do is turn clean
data into something the model can use?
Python Notebooks Without the Pain: Streamlining the ML Process
If you’ve spent any time actually trying to move a project
forward in Python notebooks, you already know how this goes. It
starts with a good intention: let’s reuse what already works,
just clone your teammate’s notebook and hit run. But then you
land in dependency purgatory—pandas throws a version error,
matplotlib won’t plot the way you expect, and half the code
relies on a package nobody mentioned in the docs. Even inside
cloud platforms that promise smooth collaboration, you’re jumping
between kernels, patching environments, and quietly dreading that
awkward chat asking who set up the original development space.
Instead of a sprint, it feels like you’re wading through
molasses. The joke in most data science teams is that there are
always more notebook versions than people in the workspace—and
nobody’s sure which notebook even worked last.We’ve all seen that
folder called “final_notebook_v5_actual.ipynb” and its six
unofficial cousins. When everyone works in their own little
bubble, changes pile up fast. Maybe your colleague added a new
feature engineering trick, but saved it locally. Someone else
tweaked the pipeline for the customer churn dataset but didn’t
sync it back to the team folder. And you, working late, discover
that the only working notebook relies on libraries last updated
in 2021. Just setting things up burns through the first chunk of
your project budget. Version control gets messy, dependencies
drift, and suddenly the project’s output becomes as unpredictable
as the tools themselves.Now, let’s be honest—this isn’t just a
confidence issue for junior data scientists. Even seasoned teams
trip over this problem. Maybe you run notebooks through
Databricks, or you’re using JupyterHub spun up on a managed VM,
but environments mutate and tracking which projects have which
libraries is a problem nobody admits to enjoying. Meetings start
with five minutes of “wait, are you using the new version or the
one I sent on Teams?” It’s a shared kitchen where every chef
brings their own knives, and you’re left with a half-baked stew
because half the utensils are missing or nobody remembered whose
batch was actually edible.This is one of the places where
Microsoft Fabric flips the usual story. Each new notebook session
comes pre-configured with the baseline set most people need:
scikit-learn, pandas, PyTorch, and the rest. You don’t have to
run endless install scripts, fudge dependency versions, or file
tickets just to get the essentials. The environment sits on a
foundation that’s stable, predictable, and updated by the
platform. It means more time fiddling with your model and less
time scanning Stack Overflow for fixes to some cryptic pip
exception.But availability isn’t just about libraries. It’s about
getting right to the data, right now, not after a twelve-step API
dance. Fabric’s notebooks tie directly into the Lakehouse—no
jumping through hoops, no awkward connection strings. You click
into a workspace, the Lakehouse tables are ready for you, and you
can immediately experiment, sample data, engineer features, and
build holdout sets without copying files around the org. You’re
not hunting for which cluster has access, or figuring out what
secrets.json someone left behind. The workflow moves the way you
expect: explore, code, iterate.Let’s say you’re a data scientist
actually kicking off a new experiment. You spin up a new notebook
inside your team’s Fabric workspace. You need last quarter’s
sales, customer feedback scores, and inventory turns. All of that
is sitting right there in the Lakehouse, live and ready. You pull
in your sales data, engineer some features—maybe encode product
categories and join the support tickets—then train a quick
baseline model. It’s familiar, but minus the usual overhead:
nobody comes knocking about missing files, there’s no scramble to
reconfigure your environment, and you’re not storing scripts in
five different clouds just for peace of mind.The bigger pay-off
comes when you want to share progress with your team—or, more
critically, not lose progress in the shuffle. Fabric’s notebook
environments are mapped to version-controlled workspaces. Every
time you make a change and hit save, the history stays attached
to your session. You don’t end up with that email chain labeled
“for review,” followed by three different “latest” versions
shared through Teams. Your experiments, failed runs, tweaks to
hyperparameters, and new feature sets all live in the same trail,
and they’re accessible to your colleagues. There’s no walled
garden for who did what; collaboration is built in without
tacking on yet another git repo or hoping folks actually use the
right naming convention.All this means there’s far less friction
between idea and execution. Instead of spending days in setup
hell or losing work to version confusion, teams using Fabric move
fluidly from the raw data in the Lakehouse to a working model.
The usual blockers—missing libraries, mismatched environments,
progress lost in translation—just don’t show up as often. Sure,
real modeling still takes thought and tuning, but at least the
tools aren’t stacking the odds against you.Of course, as soon as
you start getting solid results in these notebooks, a new problem
appears. Keeping track of what actually works, which model gives
you the right lift, and who changed what settings becomes its own
maze. When the team gets to production, it’s all too easy for
chaos to creep back in. The good news? With integrated model
tracking and governance up next, Fabric finally gives
organizations a way to lock down the output, not just the inputs
and code. And this is where the wheels usually come off in other
platforms.
Model Tracking and Governance: Keeping the Output Under Control
If you’ve spent much time actually getting models into
production, you already know that training is only half the
story. There’s a certain point where every data team runs into
something far less glamorous—tracking what got trained, with
which data, on which day, and figuring out which model is running
where. For most teams, this part isn’t just annoying, it’s the
step that gets skipped until it turns into a fire drill. The
reality? Model management in the wild looks like a tangled mess
of folders—“final_v2,” “donotdelete,” the infamous
“backup_old”—and a spreadsheet that’s never quite in sync.
There’s usually someone on the team who tries to keep a wiki or
OneNote page with dates and file links, but that plan falls apart
as soon as one person heads out on vacation or switches
projects.When something breaks, all bets are off. Production
predictions start looking strange, an executive asks why customer
churn jumped for one region, and the investigation begins. If you
can’t easily answer questions like, “What code generated this
result?” or “Which training data did you use last quarter?”
you’re headed for some long hours. It’s like someone’s dumped a
jigsaw puzzle onto your desk but hidden half the pieces in other
departments’ Team sites. In regulated industries, the stakes jump
even higher. You’ll be asked for explainability, lineage, and
concrete proof that every model’s been reviewed and approved by
the right people. Most data science tools leave you building that
audit trail by hand or exporting piles of logs that no one
actually checks until something’s already gone wrong.Take, for
example, an ML audit. You’re sitting in a meeting with
compliance, and they want to know—step by step—how your model
predicted that a customer would likely cancel their subscription
next month. There’s no handy “undo” for this sort of thing. If
you’re relying on old screenshots, emails, or hope, it feels less
like data science and more like forensic accounting. Explaining
why one cohort got flagged and not another becomes a guessing
game. Cohorts and weights might have changed between version 1.3
and 1.7 but without clear records, the question of “why” becomes
a black box.That’s the kind of pain point that Microsoft Fabric
addresses head-on. By folding in MLflow under the hood, Fabric
changes what’s considered “table stakes” for model management.
Every time you train a model, Fabric’s MLflow integration records
not just the code, but the dataset version, environment details,
hyperparameters, metrics, and even output artifacts. These aren’t
just blobs of metadata; they’re searchable, browsable, and
attached to real experiment runs. Gone is the guesswork—the
platform keeps exact records, so you don’t have to maintain a
parallel system of spreadsheets and file shares just to track
what’s changing.Imagine you’re on the business side and need to
answer a question about customer churn. Instead of putting in a
support ticket or pinging three analysts, you open the Fabric
interface, look up the churn model, and see a timeline—every
training run and deployment, with who did what, and even which
data set powered which experiment. It’s not just a snapshot in
time; you get the full journey: training data, code version,
manual notes, performance metrics, deployment approvals—all
bundled and tagged in one timeline.This makes proposals and
approvals much less of a bottleneck. If the compliance lead or
head of a business unit needs to approve a model for release,
they’re not searching through emails or exporting audit logs.
Approval workflows are built in—who signed off, when it was
signed, and what was reviewed is baked into the record, along
with role-based access. Only the right people can make changes or
green-light deployments, but everyone who needs traceability has
it out of the box. Even as models evolve, you retain a clear
lineage. If a new version goes live and the KPIs shift, you can
look back and see what changed, who changed it, and even download
the old code and data for reruns or explainability.It’s almost a
surprise how much smoother everything runs when tracking and
governance are built into the same platform as the notebooks and
data. You want to try a new approach or roll back to a previous
model version? It’s two clicks, not a six-user chase across three
SharePoint sites. Regulatory events don’t send panic through the
team, because that audit trail was created automatically, not as
a painful afterthought. The business can have confidence that
every model in play is accounted for and can be explained.But,
now that traceability and governance are finally reliable,
there’s a pivotal question left—can you actually deploy your
model without blowing up the project’s timeline or budget?
Managing the model lifecycle is one thing; moving from lab
experiment to live, production value is where most systems
crumble. Whether rollout happens in two days or two months, it
always comes back to making deployment and scaling less of a
minefield. That’s where Fabric’s approach to model serving and
operations changes the game. You’re no longer left guessing how
much production will cost, or whether your model will even get
used outside of a demo.
Deploying and Scaling ML: From Prototype to Business Value
If you’ve made it as far as training a model in your Fabric
environment, you’re ahead of most folks. But the finish
line—actually getting that model to power a business process or
app—is where the real friction sets in. You’d think that with all
the automation in modern ML platforms, getting a model from
“works in my notebook” to “running in production” would feel
automatic. The truth is, this is still the phase that slows teams
down. Most organizations follow a handoff ritual that feels
archaic: you finish your code, and then you lob it over the fence
to another team. They review your scripts, refactor half of it
for their stack, maybe rewrite it entirely, and then spend weeks
arguing about deployment targets and resource sizing. During all
that, the “approved” model might keep changing upstream, so what
goes live isn’t even the version you meant to roll out. Not
exactly empowering when you’re being asked to account for
business impact on a quarterly review.You’d be forgiven for
thinking the hard part should end with the model passing its
accuracy checks. But even after code hits production, there’s the
question of scale—and the unwelcome reality that costs can spiral
fast. It isn’t just the compute for running your model; it’s the
monitoring, logging, retraining cycles, and the sprawl of
endpoints barnacled onto your cloud bill. A lot of orgs drift
into a pattern where models go live, sit untouched, and quietly
decay. No one’s watching performance, feedback loops break, and
data drift creeps in. Suddenly you’re fielding questions about
why numbers look so off, only to discover the model running in
production is not anywhere close to the latest, or worse, it
costs more just to keep the thing warm than your entire training
budget last month. This isn’t just a finance headache—it’s a
credibility risk for the whole data team.Let’s make this real.
Say you launch a new app feature, powered by a shiny fresh model.
You celebrate the release, everyone nods, and within a few weeks,
the finance lead is quietly frowning at the cloud bill. Usage
spikes, costs balloon, and now you have to answer for why the
“automation” is eating your margins. Meanwhile, someone in a
different team realizes no one’s monitoring live predictions
anyway, so performance is anyone’s guess. It’s like running a
fancy new coffee shop where you spend more on espresso machines
than the revenue from customers, and nobody’s quite sure who’s
checking the inventory.This is the part of the ML lifecycle that
Microsoft Fabric tries to simplify—without forcing you into more
tradeoffs. Instead of treating deployment as a separate ops
problem, Fabric turns it into a self-service menu. Models get
deployed as managed endpoints or batch jobs right from the
workspace where you built them. That means you pick how you want
your predictions served: as a persistent web service, ready to
handle incoming requests, or as a scheduled offline job that runs
overnight and populates results wherever you need them. You
choose the cost profile; you set the scale. If you only need to
run forecasts once a week, you don’t pay for a server to sit idle
all month. If your use case gets busier, scaling is done via
settings, not a new infrastructure project.Monitoring is bundled
in by default. Rather than stringing together third-party plugins
or writing custom scripts to keep watch on your endpoints, Fabric
gives you hooks to view requests, check usage stats, and flag
issues right where you manage the model. If retraining needs to
happen based on new data or drift, you wire it up with
automation—no awkward manual interventions. Your ML pipeline
isn’t frozen at launch; it stays live, adaptable, and
refreshable.A lot of teams find this especially useful for
business apps built on Power Platform. Let’s say your sales and
ops team wants live forecasts in their dashboard, with numbers
powered by your Python model. Fabric lets you expose your ML
predictions as web services, which Power Apps and Power Automate
can consume directly. Each request is tracked, and every
round-trip between app and model is logged, so you see exactly
what’s being used, when, and by whom. You’re not playing catch-up
with finance every month—cost transparency is part of the
everyday reporting. If something isn’t providing value, you catch
it early instead of getting blindsided after a quarter.And for
anyone who’s tried to make a model “stick” in production, the
automation here is the hinge point. Schedule retraining based on
actual usage or model performance, or even re-deploy on a rolling
window as new data becomes available. You set it up—Fabric takes
care of the mechanics, and the whole thing is documented
alongside your other governance controls. If the business wants a
snapshot of how things are running or needs to prove compliance,
you’ve got a one-click export, not a frantic night stapling logs
together. When you look at it from end to end, Fabric’s focus on
deployment and scaling means your machine learning lifecycle
actually completes the loop. Models aren’t just pilot projects or
analytics eye-candy—they drive business value at a cost you can
predict, backed by real transparency. You spend less time
patching together operational hacks and more time finding new
ways to put your ML models to work.Now, with data feeding your
models, experiments tracked, and production under control,
there’s a bigger shift happening under the hood. Fabric’s
ecosystem means AI stops being “someone else’s problem down the
hall” and becomes a shared platform. Suddenly, your next project
can move faster, and your impact as a data scientist or business
user stretches a whole lot further. The next step is thinking
about how this system-level approach shapes not only one project,
but your overall approach to machine learning in the enterprise.
Conclusion
If you’ve ever felt like your ML workflow is more duct tape than
design, you’re seeing the problem Fabric tries to fix. The
platform doesn’t just polish up the model itself—it smooths out
every rough edge, starting with the way data lands in Lakehouse,
through to notebooks, and finally deployment and monitoring. It
forces the parts to fit together, making those “pilot” projects
feel less like experiments that never launch. If this is where
your data science work has stalled, it’s worth reconsidering your
process. Want more walkthroughs and concrete results? Hit
subscribe and start taking your models further.
Get full access to M365 Show - Microsoft 365 Digital Workplace
Daily at m365.show/subscribe
Weitere Episoden
22 Minuten
vor 3 Monaten
22 Minuten
vor 3 Monaten
21 Minuten
vor 3 Monaten
22 Minuten
vor 3 Monaten
22 Minuten
vor 3 Monaten
In Podcasts werben
Kommentare (0)