Advertisement
UK markets close in 1 minute
  • FTSE 100

    7,962.74
    +30.76 (+0.39%)
     
  • FTSE 250

    19,898.58
    +87.92 (+0.44%)
     
  • AIM

    744.03
    +1.92 (+0.26%)
     
  • GBP/EUR

    1.1701
    +0.0032 (+0.27%)
     
  • GBP/USD

    1.2641
    +0.0003 (+0.03%)
     
  • Bitcoin GBP

    56,075.60
    +1,423.46 (+2.60%)
     
  • CMC Crypto 200

    885.54
    0.00 (0.00%)
     
  • S&P 500

    5,253.35
    +4.86 (+0.09%)
     
  • DOW

    39,787.26
    +27.18 (+0.07%)
     
  • CRUDE OIL

    82.63
    +1.28 (+1.57%)
     
  • GOLD FUTURES

    2,236.40
    +23.70 (+1.07%)
     
  • NIKKEI 225

    40,168.07
    -594.66 (-1.46%)
     
  • HANG SENG

    16,541.42
    +148.58 (+0.91%)
     
  • DAX

    18,491.68
    +14.59 (+0.08%)
     
  • CAC 40

    8,209.90
    +5.09 (+0.06%)
     

AWS launches SageMaker Data Wrangler, a new data preparation service for machine learning

AWS launched a new service today, Amazon SageMaker Data Wrangler, that makes it easier for data scientists to prepare their data for machine learning training. In addition, the company is also launching SageMaker Feature Store, available in the SageMaker Studio, a new service that makes it easier to name, organize, find and share machine learning features.

AWS is also launching Sagemaker Pipelines, a new service that's integrated with the rest of the platform and that provides a CI/CD service for machine learning to create and automate workflows, as well as an audit trail for model components like training data and configurations.

As AWS CEO Andy Jassy pointed out in his keynote at the company's re:Invent conference, data preparation remains a major challenge in the machine learning space. Users have to write their queries and the code to get the data from their data stores first, then write the queries to transform that code and combine features as necessary. All of that is work that doesn't actually focus on building the models but on the infrastructure of building models.

Data Wrangler comes with over 300 pre-configured data transformation built-in, that help users convert column types or impute missing data with mean or median values. There are also some built-in visualization tools to help identify potential errors, as well as tools for checking if there are inconsistencies in the data and diagnose them before the models are deployed.

ADVERTISEMENT

All of these workflows can then be saved in a notebook or as a script so that teams can replicate them -- and used in SageMaker Pipelines to automate the rest of the workflow, too.

 

It's worth noting that there are quite a few startups that are working on the same problem. Wrangling machine learning data, after all, is one of the most common problems in the space. For the most part, though, most companies still build their own tools and as usual, that makes this area ripe for a managed service.