• Latest
  • Trending
pixart trainium inferentia 1120x630 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

May 15, 2025
Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

Device Trust from Android Enterprise

May 15, 2025

Detecting misbehavior in frontier reasoning models

May 14, 2025
TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

Gemini is coming to watches, cars, TV and XR devices

May 14, 2025

New tools for building agents

May 14, 2025

Driving growth and ‘WOW’ moments with OpenAI

May 14, 2025

OpenAI’s proposals for the U.S. AI Action Plan

May 14, 2025

The court rejects Elon’s latest attempt to slow OpenAI down

May 14, 2025

New in ChatGPT for Business: March 2025

May 14, 2025

EliseAI improves housing and healthcare efficiency with AI

May 14, 2025

Introducing next-generation audio models in the API

May 14, 2025
TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

Google launches Material 3 Expressive redesign for Android, Wear OS devices

May 14, 2025

Personalizing travel at scale with OpenAI

May 14, 2025
Toolz Guru
  • Home
    Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

    Device Trust from Android Enterprise

    TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

    Gemini is coming to watches, cars, TV and XR devices

    TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Googles Geothermal Agreement SS 1920x1080.max 1440x810 Toolz Guru Google’s new model for clean energy approved in Nevada

    Google’s new model for clean energy approved in Nevada

    Superpollutants SS 1920x1080.max 1440x810 Toolz Guru We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    Searchscams SS 1920x1080.max 1440x810 Toolz Guru Google’s new report on fighting scams in search results

    Google’s new report on fighting scams in search results

    AIFF SS.width 1300 Toolz Guru Google’s AI Futures Fund works with AI startups

    Google’s AI Futures Fund works with AI startups

    GFSA AI for Energy demo copy blog banner v24.width 1300 Toolz Guru Google for Startup Accelerator: AI for Energy opens

    Google for Startup Accelerator: AI for Energy opens

  • AI News
  • AI Tools
    • Image Generation
    • Content Creation
    • SEO Tools
    • Digital Tools
    • Language Models
    • Video & Audio
  • Digital Marketing
    • Content Marketing
    • Social Media
    • Search Engine Optimization
  • Reviews
No Result
View All Result
Toolz Guru
  • Home
    Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

    Device Trust from Android Enterprise

    TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

    Gemini is coming to watches, cars, TV and XR devices

    TAS Material 3 Expressive Blog Header 1.width 1300 Toolz Guru Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Google launches Material 3 Expressive redesign for Android, Wear OS devices

    Googles Geothermal Agreement SS 1920x1080.max 1440x810 Toolz Guru Google’s new model for clean energy approved in Nevada

    Google’s new model for clean energy approved in Nevada

    Superpollutants SS 1920x1080.max 1440x810 Toolz Guru We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    We’re announcing two new partnerships to eliminate superpollutants and help the atmosphere.

    Searchscams SS 1920x1080.max 1440x810 Toolz Guru Google’s new report on fighting scams in search results

    Google’s new report on fighting scams in search results

    AIFF SS.width 1300 Toolz Guru Google’s AI Futures Fund works with AI startups

    Google’s AI Futures Fund works with AI startups

    GFSA AI for Energy demo copy blog banner v24.width 1300 Toolz Guru Google for Startup Accelerator: AI for Energy opens

    Google for Startup Accelerator: AI for Energy opens

  • AI News
  • AI Tools
    • Image Generation
    • Content Creation
    • SEO Tools
    • Digital Tools
    • Language Models
    • Video & Audio
  • Digital Marketing
    • Content Marketing
    • Social Media
    • Search Engine Optimization
  • Reviews
No Result
View All Result
Toolz Guru
No Result
View All Result
Home SEO Tools

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

by Maxim Makedonsky
May 15, 2025
in SEO Tools
0 0
pixart trainium inferentia 1120x630 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia
Share on FacebookShare on Twitter


PixArt-Sigma is a diffusion transformer model that is capable of image generation at 4k resolution. This model shows significant improvements over previous generation PixArt models like Pixart-Alpha and other diffusion models through dataset and architectural improvements. AWS Trainium and AWS Inferentia are purpose-built AI chips to accelerate machine learning (ML) workloads, making them ideal for cost-effective deployment of large generative models. By using these AI chips, you can achieve optimal performance and efficiency when running inference with diffusion transformer models like PixArt-Sigma.

This post is the first in a series where we will run multiple diffusion transformers on Trainium and Inferentia-powered instances. In this post, we show how you can deploy PixArt-Sigma to Trainium and Inferentia-powered instances.

Solution overview

The steps outlined below will be used to deploy the PixArt-Sigma model on AWS Trainium and run inference on it to generate high-quality images.

  • Step 1 – Pre-requisites and setup
  • Step 2 – Download and compile the PixArt-Sigma model for AWS Trainium
  • Step 3 – Deploy the model on AWS Trainium to generate images

Step 1 – Prerequisites and setup

To get started, you will need to set up a development environment on a trn1, trn2, or inf2 host. Complete the following steps:

  1. Launch a trn1.32xlarge or trn2.48xlarge instance with a Neuron DLAMI. For instructions on how to get started, refer to Get Started with Neuron on Ubuntu 22 with Neuron Multi-Framework DLAMI.
  2. Launch a Jupyter Notebook sever. For instructions to set up a Jupyter server, refer to the following user guide.
  3. Clone the aws-neuron-samples GitHub repository:
    git clone https://github.com/aws-neuron/aws-neuron-samples.git

  4. Navigate to the hf_pretrained_pixart_sigma_1k_latency_optimized.ipynb notebook:
    cd aws-neuron-samples/torch-neuronx/inference

The provided example script is designed to run on a Trn2 instance, but you can adapt it for Trn1 or Inf2 instances with minimal modifications. Specifically, within the notebook and in each of the component files under the neuron_pixart_sigma directory, you will find commented-out changes to accommodate Trn1 or Inf2 configurations.

Step 2 – Download and compile the PixArt-Sigma model for AWS Trainium

This section provides a step-by-step guide to compiling PixArt-Sigma for AWS Trainium.

Download the model

You will find a helper function in cache-hf-model.py in above mentioned GitHub repository that shows how to download the PixArt-Sigma model from Hugging Face. If you are using PixArt-Sigma in your own workload, and opt not to use the script included in this post, you can use the huggingface-cli to download the model instead.

The Neuron PixArt-Sigma implementation contains a few scripts and classes. The various files and scrips are broken down as follows:

├── compile_latency_optimized.sh # Full Model Compilation script for Latency Optimized
├── compile_throughput_optimized.sh # Full Model Compilation script for Throughput Optimized
├── hf_pretrained_pixart_sigma_1k_latency_optimized.ipynb # Notebook to run Latency Optimized Pixart-Sigma
├── hf_pretrained_pixart_sigma_1k_throughput_optimized.ipynb # Notebook to run Throughput Optimized Pixart-Sigma
├── neuron_pixart_sigma
│ ├── cache_hf_model.py # Model downloading Script
│ ├── compile_decoder.py # Text Encoder Compilation Script and Wrapper Class
│ ├── compile_text_encoder.py # Text Encoder Compilation Script and Wrapper Class
│ ├── compile_transformer_latency_optimized.py # Latency Optimized Transformer Compilation Script and Wrapper Class
│ ├── compile_transformer_throughput_optimized.py # Throughput Optimized Transformer Compilation Script and Wrapper Class
│ ├── neuron_commons.py # Base Classes and Attention Implementation
│ └── neuron_parallel_utils.py # Sharded Attention Implementation
└── requirements.txt

This notebook will help you to download the model, compile the individual component models, and invoke the generation pipeline to generate an image. Although the notebooks can be run as a standalone sample, the next few sections of this post will walk through the key implementation details within the component files and scripts to support running PixArt-Sigma on Neuron.

Sharding PixArt linear layers

For each component of PixArt (T5, Transformer, and VAE), the example uses Neuron specific wrapper classes. These wrapper classes serve two purposes. The first purpose is it allows us to trace the models for compilation:

class InferenceTextEncoderWrapper(nn.Module):
    def __init__(self, dtype, t: T5EncoderModel, seqlen: int):
        super().__init__()
        self.dtype = dtype
        self.device = t.device
        self.t = t
    def forward(self, text_input_ids, attention_mask=None):
        return [self.t(text_input_ids, attention_mask)['last_hidden_state'].to(self.dtype)]

Please refer to the neuron_commons.py file for all wrapper modules and classes.

The second reason for using wrapper classes is to modify the attention implementation to run on Neuron. Because diffusion models like PixArt are typically compute-bound, you can improve performance by sharding the attention layer across multiple devices. To do this, you replace the linear layers with NeuronX Distributed’s RowParallelLinear and ColumnParallelLinear layers:

def shard_t5_self_attention(tp_degree: int, selfAttention: T5Attention):
    orig_inner_dim = selfAttention.q.out_features
    dim_head = orig_inner_dim // selfAttention.n_heads
    original_nheads = selfAttention.n_heads
    selfAttention.n_heads = selfAttention.n_heads // tp_degree
    selfAttention.inner_dim = dim_head * selfAttention.n_heads
    orig_q = selfAttention.q
    selfAttention.q = ColumnParallelLinear(
        selfAttention.q.in_features,
        selfAttention.q.out_features,
        bias=False, 
        gather_output=False)
    selfAttention.q.weight.data = get_sharded_data(orig_q.weight.data, 0)
    del(orig_q)
    orig_k = selfAttention.k
    selfAttention.k = ColumnParallelLinear(
        selfAttention.k.in_features, 
        selfAttention.k.out_features, 
        bias=(selfAttention.k.bias is not None),
        gather_output=False)
    selfAttention.k.weight.data = get_sharded_data(orig_k.weight.data, 0)
    del(orig_k)
    orig_v = selfAttention.v
    selfAttention.v = ColumnParallelLinear(
        selfAttention.v.in_features, 
        selfAttention.v.out_features, 
        bias=(selfAttention.v.bias is not None),
        gather_output=False)
    selfAttention.v.weight.data = get_sharded_data(orig_v.weight.data, 0)
    del(orig_v)
    orig_out = selfAttention.o
    selfAttention.o = RowParallelLinear(
        selfAttention.o.in_features,
        selfAttention.o.out_features,
        bias=(selfAttention.o.bias is not None),
        input_is_parallel=True)
    selfAttention.o.weight.data = get_sharded_data(orig_out.weight.data, 1)
    del(orig_out)
    return selfAttention

Please refer to the neuron_parallel_utils.py file for more details on parallel attention.

Compile individual sub-models

The PixArt-Sigma model is composed of three components. Each component is compiled so the entire generation pipeline can run on Neuron:

  • Text encoder – A 4-billion-parameter encoder, which translates a human-readable prompt into an embedding. In the text encoder, the attention layers are sharded, along with the feed-forward layers, with tensor parallelism.
  • Denoising transformer model – A 700-million-parameter transformer, which iteratively denoises a latent (a numerical representation of a compressed image). In the transformer, the attention layers are sharded, along with the feed-forward layers, with tensor parallelism.
  • Decoder – A VAE decoder that converts our denoiser-generated latent to an output image. For the decoder, the model is deployed with data parallelism.

Now that the model definition is ready, you need to trace a model to run it on Trainium or Inferentia. You can see how to use the trace() function to compile the decoder component model for PixArt in the following code block:

compiled_decoder = torch_neuronx.trace(
    decoder,
    sample_inputs,
    compiler_workdir=f"{compiler_workdir}/decoder",
    compiler_args=compiler_flags,
    inline_weights_to_neff=False
)

Please refer to the compile_decoder.py file for more on how to instantiate and compile the decoder.

To run models with tensor parallelism, a technique used to split a tensor into chunks across multiple NeuronCores, you need to trace with a pre-specified tp_degree. This tp_degree specifies the number of NeuronCores to shard the model across. It then uses the parallel_model_trace API to compile the encoder and transformer component models for PixArt:

compiled_text_encoder = neuronx_distributed.trace.parallel_model_trace(
    get_text_encoder_f,
    sample_inputs,
    compiler_workdir=f"{compiler_workdir}/text_encoder",
    compiler_args=compiler_flags,
    tp_degree=tp_degree,
)

Please refer to the compile_text_encoder.py file for more details on tracing the encoder with tensor parallelism.

Lastly, you trace the transformer model with tensor parallelism:

compiled_transformer = neuronx_distributed.trace.parallel_model_trace(
    get_transformer_model_f,
    sample_inputs,
    compiler_workdir=f"{compiler_workdir}/transformer",
    compiler_args=compiler_flags,
    tp_degree=tp_degree,
    inline_weights_to_neff=False,
)

Please refer to the compile_transformer_latency_optimized.py file for more details on tracing the transformer with tensor parallelism.

You will use the compile_latency_optimized.sh script to compile all three models as described in this post, so these functions will be run automatically when you run through the notebook.

Step 3 – Deploy the model on AWS Trainium to generate images

This section will walk us through the steps to run inference on PixArt-Sigma on AWS Trainium.

Create a diffusers pipeline object

The Hugging Face diffusers library is a library for pre-trained diffusion models, and includes model-specific pipelines that bundle the components (independently-trained models, schedulers, and processors) needed to run a diffusion model. The PixArtSigmaPipeline is specific to the PixArtSigma model, and is instantiated as follows:

pipe: PixArtSigmaPipeline = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
    torch_dtype=torch.bfloat16,
    local_files_only=True,
    cache_dir="pixart_sigma_hf_cache_dir_1024")

Please refer to the hf_pretrained_pixart_sigma_1k_latency_optimized.ipynb notebook for details on pipeline execution.

Load compiled component models into the generation pipeline

After each component model has been compiled, load them into the overall generation pipeline for image generation. The VAE model is loaded with data parallelism, which allows us to parallelize image generation for batch size or multiple images per prompt. For more details, refer to the hf_pretrained_pixart_sigma_1k_latency_optimized.ipynb notebook.

vae_decoder_wrapper.model = torch_neuronx.DataParallel( 
    torch.jit.load(decoder_model_path), [0, 1, 2, 3], False
)

text_encoder_wrapper.t = neuronx_distributed.trace.parallel_model_load(
    text_encoder_model_path
)

Finally, the loaded models are added to the generation pipeline:

pipe.text_encoder = text_encoder_wrapper
pipe.transformer = transformer_wrapper
pipe.vae.decoder = vae_decoder_wrapper
pipe.vae.post_quant_conv = vae_post_quant_conv_wrapper

Compose a prompt

Now that the model is ready, you can write a prompt to convey what kind of image you want generated. When creating a prompt, you should always be as specific as possible. You can use a positive prompt to convey what is wanted in your new image, including a subject, action, style, and location, and can use a negative prompt to indicate features that should be removed.

For example, you can use the following positive and negative prompts to generate a photo of an astronaut riding a horse on mars without mountains:

# Subject: astronaut
# Action: riding a horse
# Location: Mars
# Style: photo
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt = "mountains"

Feel free to edit the prompt in your notebook using prompt engineering to generate an image of your choosing.

Generate an image

To generate an image, you pass the prompt to the PixArt model pipeline, and then save the generated image for later reference:

# pipe: variable holding the Pixart generation pipeline with each of 
# the compiled component models
images = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_images_per_prompt=1,
        height=1024, # number of pixels
        width=1024, # number of pixels
        num_inference_steps=25 # Number of passes through the denoising model
    ).images
    
    for idx, img in enumerate(images): 
        img.save(f"image_{idx}.png")

Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Cleanup

To avoid incurring additional costs, stop your EC2 instance using either the AWS Management Console or AWS Command Line Interface (AWS CLI).

Conclusion

In this post, we walked through how to deploy PixArt-Sigma, a state-of-the-art diffusion transformer, on Trainium instances. This post is the first in a series focused on running diffusion transformers for different generation tasks on Neuron. To learn more about running diffusion transformers models with Neuron, refer to Diffusion Transformers.


About the Authors

apinnint Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS InferentiaAchintya Pinninti is a Solutions Architect at Amazon Web Services. He supports public sector customers, enabling them to achieve their objectives using the cloud. He specializes in building data and machine learning solutions to solve complex problems.

miriam Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS InferentiaMiriam Lebowitz is a Solutions Architect focused on empowering early-stage startups at AWS. She leverages her experience with AI/ML to guide companies to select and implement the right technologies for their business objectives, setting them up for scalable growth and innovation in the competitive startup world.

Related Post

TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

Gemini is coming to watches, cars, TV and XR devices

May 14, 2025
Monica AI

Monica AI: A Quick Look at Its Features and Performance

February 14, 2025

SadafJPG 100 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS InferentiaSadaf Rasool is a Solutions Architect in Annapurna Labs at AWS. Sadaf collaborates with customers to design machine learning solutions that address their critical business challenges. He helps customers train and deploy machine learning models leveraging AWS Trainium or AWS Inferentia chips to accelerate their innovation journey.

Screenshot 2025 04 25 at 11.48.15 AM 1 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS InferentiaJohn Gray is a Solutions Architect in Annapurna Labs, AWS, based out of Seattle. In this role, John works with customers on their AI and machine learning use cases, architects solutions to cost-effectively solve their business problems, and helps them build a scalable prototype using AWS AI chips.



Source link

Donation

Buy author a coffee

Donate
Maxim Makedonsky

Maxim Makedonsky

  • ChatGPT

    The Rise of the Content Creator: How to Build Your Brand in the Digital Age

    36 shares
    Share 14 Tweet 9
  • Grok AI Upgrade

    27 shares
    Share 11 Tweet 7
  • Junia AI: Content Generation & SEO Tools

    26 shares
    Share 10 Tweet 7
  • Boost Your WordPress Speed: Quick Tips!

    25 shares
    Share 10 Tweet 6
  • Cool Tech Gifts for Your Valentine

    23 shares
    Share 9 Tweet 6
pixart trainium inferentia 1120x630 Toolz Guru Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia

by Maxim Makedonsky
May 15, 2025
0

PixArt-Sigma is a diffusion transformer model that is capable of image generation at 4k resolution. This model shows significant improvements...

Social share Device trust.width 1300 Toolz Guru Device Trust from Android Enterprise

Device Trust from Android Enterprise

by Maxim Makedonsky
May 15, 2025
0

Integrated security, all in one viewMobile security has often been treated as a silo, separate from endpoint and identity security....

Detecting misbehavior in frontier reasoning models

by Maxim Makedonsky
May 14, 2025
0

Frontier reasoning models exploit loopholes when given the chance. We show we can detect exploits using an LLM to monitor...

TAS Gemini Across Devices Blog Header.width 1300 Toolz Guru Gemini is coming to watches, cars, TV and XR devices

Gemini is coming to watches, cars, TV and XR devices

by Maxim Makedonsky
May 14, 2025
0

Make your drive more productive and enjoyable, hands-freeHands-free voice commands with Google Assistant have always been at the core of...

No Content Available
Facebook Twitter Instagram Youtube
Currently Playing

Recent Posts

  • Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia
  • Device Trust from Android Enterprise
  • Detecting misbehavior in frontier reasoning models

Categories

  • AI News
  • AI News Feeds
  • AI Tools
  • Blogging Tips
  • Business
  • ChatGPT
  • Content Markeeting
  • Digital
  • Digital Marketing
  • Digital Tools
  • Image Generation
  • Language Models
  • Productivity
  • Prompts
  • Reviews
  • Search Engine Optimization
  • SEO Tools
  • Social Media
  • Technology
  • Video & Audio
  • Videos

2025 by Toolz Guru

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home

2025 by Toolz Guru

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version