NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 653
Star 3.2k

Code
Issues 240
Pull requests 128
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 69 Milestones 0

New pull request New

128 Open 1,906 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[PyTorch] Support single parameter for GroupedLinear

#2731 opened Mar 4, 2026 by ksivaman

Loading…

9 of 13 tasks

WAR sort_chunks_by_index intermittent failures in L0 JAX unitttest

#2730 opened Mar 4, 2026 by tdophung • Draft

6 of 13 tasks

[JAX] GSPMD Deprecation Warning - Only trigger when the primitive is invoked

#2729 opened Mar 3, 2026 by phu0ngng

Loading…

5 of 13 tasks

fix: scope get_full_cu_seqlens cache key by device and inference mode

#2728 opened Mar 3, 2026 by DmCarpe93

Loading…

8 of 13 tasks

[CI] Refactor CI build on GitHub

#2723 opened Mar 2, 2026 by ptrendx • Draft

1 of 13 tasks

[Common, pyTorch] Grouped MXFP8 dequantize support

#2722 opened Mar 2, 2026 by ptrendx • Draft

1 of 13 tasks

Fix for async dcp checkpointing with Float8Tensors

#2721 opened Mar 2, 2026 by pstjohn • Draft

Add MXFP8 attention

#2719 opened Mar 1, 2026 by cyanguwa • Draft

13 tasks

pass params_dtype to qk_norm creation

#2718 opened Feb 28, 2026 by pstjohn

Loading…

Hongbinl/offload activation cuda graph mxfp8 offload fix

#2716 opened Feb 27, 2026 by lhb8125 • Draft

13 tasks

Add DCP compatibility for FSDP2-TP sharding in TransformerEngine.

#2713 opened Feb 26, 2026 by cspades • Draft

13 tasks

Enable dequantization from MXFP8 tensor with only columnwise data

#2712 opened Feb 26, 2026 by ptrendx

Loading…

13 tasks

[JAX] Support calling MOE router kernels from JAX side

#2711 opened Feb 26, 2026 by tdophung

Loading…

1 of 13 tasks

[Common][PyTorch] Add z_loss_weight and log_sum_exp output to parallel_cross_entropy

#2707 opened Feb 26, 2026 by bassoy • Draft

8 tasks done

[Draft] Newton-Schulz via cuSOLVERMp

#2706 opened Feb 25, 2026 by vcherepanov-nv

Loading…

6 of 13 tasks

[All] Added better error messages

#2705 opened Feb 25, 2026 by ptrendx

Loading…

Fix Flash Attention 3 API compatibility for window size parameters 2.14.0

#2704 opened Feb 25, 2026 by jhvmhg

Loading…

3 of 13 tasks

[Draft][PyTorch] torch.compile support for TE Linear

#2701 opened Feb 24, 2026 by pggPL • Draft

13 tasks

Add fused_adam, quantized_model_init, and fsdp2 example

#2698 opened Feb 22, 2026 by pstjohn

Loading…

[PyTorch] Zero-initialize learnable softmax_offset in DotProductAttention

#2694 opened Feb 20, 2026 by fjosw

Loading…

7 of 13 tasks

Enable sm120 support for fused attn if cuDNN is 9.18.1+

#2693 opened Feb 20, 2026 by KshitijLakhani • Draft

13 tasks

[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets 2.14.0

#2692 opened Feb 19, 2026 by KshitijLakhani

Loading…

7 of 13 tasks

[PyTorch] Error out if constructing LayerNormLinear with row tensor parallelism bug

Something isn't working

#2688 opened Feb 17, 2026 by timmoon10

Loading…

6 of 13 tasks

[PyTorch] torch.compile support for permutation functions

#2686 opened Feb 17, 2026 by pggPL

Loading…

9 of 13 tasks

[JAX] Integrate BF16 Grouped GEMM with on-device group sizes

#2680 opened Feb 13, 2026 by jberchtold-nvidia

Loading…

8 of 13 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!